A core functionality of ancpBIDS is to query information in a BIDS compatible dataset. Any file in a BIDS dataset that does not conform to the BIDS naming convention will be available as a File object instead of an Artifact. This allows to also process ordinary files that are not (yet) part of the BIDS specification.

Load an existing BIDS dataset

To get started, we download a test dataset from These datasets are only meant to experiment with ancpBIDS and not expected to be used in research.

from ancpbids import utils
dataset_path = utils.fetch_dataset('ds005')
If fetching the dataset succeeds, you will find a file in your home folder in ‘~/.ancp-bids/datasets’ and the contents of that zip file are extracted to ‘~/.ancp-bids/datasets/ds005’ or ‘~/.ancp-bids/datasets/ds003483’ respectively. If the dataset has already been downloaded within a previous call to fetch_dataset(), then it will not be downloaded again.

If you have your own BIDS dataset, then feel free to use those instead - and do not forget to adapt the following code to your specific dataset.

Now, after we have the path to our dataset folder, we read information about a BIDS dataset into a layout-object using BIDSLayout. BIDSLayout takes as input the absolute or relative path to a BIDS-dataset and returns a layout in which most of the information about the dataset is held in-memory.

from ancpbids import BIDSLayout
layout = BIDSLayout(dataset_path)

Note that in order optimize speed, BIDSLayout does not perform a deep search within data files. It only reads and parses the files and directories necessary to gather information defined in the BIDS specification.

Perform some basic queries

We can now use several functions to query information about the dataset. For example, we can ask which subjects are in the dataset:

# ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16']
#['009', '012', '013', '014', '015', '016', '017', '018', '019', '020', '021', '022', '023', '024', '025', '026', '027', '028', '029', '030', '031']

This will provide a list of all subject names in the dataset.

Next, let us see how many runs there are:

#['01', '02', '03']

Note that the returned runs are collected over all subjects, i.e. it is not guaranteed that each participant has the same number of runs.

Now, let’s check out the tasks of the experiment:

tasks = layout.get_tasks()

These simple queries should support most of the entities defined in BIDS. The queries are constructed as layout.get_NameOfTheEntity(). The query will return ‘[]’ (empty list) if the entity does not exist in the dataset or if a wrong string was provided as part of the ‘get_’ call.

To get an idea of the entities you can query for in your dataset you can use the layout.get_entities() function to receive a dictionary with all entities defined in the dataset and its respective values.

print("Entities: ", list(avail_entitities.keys()))
print("Value of task: ", avail_entitities['task']
#Entities:  ['task', 'sub', 'run', 'ds', 'type']
#Value of task:  {'mixedgamblestask'}
Note that BIDS allows the definition of non standard labels and indexes in filenames.

Querying for metadata

Metadata from json files can be queried using layout.get_metadata(entity=’abc’,suffix=’xyz’). It will return a dictionary with keys and values.

metadata = layout.get_metadata(task='mixedgamblestask', suffix='bold')
print("metadata: ", list(metadata.keys()))
print("Value of RepetitionTime: ", metadata['RepetitionTime'])
#metadata:  ['RepetitionTime', 'TaskName', 'SliceTiming']
#Value of RepetitionTime:  2.0
metadata = layout.get_metadata(task='induction', suffix='meg')
print("metadata: ", list(metadata.keys()))
print("Value of Dewar position: ", metadata['DewarPosition'])
#metadata:  ['AssociatedEmptyRoom', 'CapManufacturer',
#'CapManufacturersModelName', 'ContinuousHeadLocalization',
#'DeviceSerialNumber', 'DewarPosition', 'DigitizedHeadPoints',
#'DigitizedLandmarks', 'ECGChannelCount', 'ECOGChannelCount',
#'EEGChannelCount', 'EEGPlacementScheme', 'EEGReference',
#'EMGChannelCount', 'EOGChannelCount', 'EpochLength',
#'HeadCoilFrequency', 'InstitutionAddress', 'InstitutionName',
#'InstitutionalDepartmentName', 'Instructions', 'MEGChannelCount',
#'MEGREFChannelCount', 'Manufacturer', 'ManufacturersModelName',
#'MiscChannelCount', 'PowerLineFrequency', 'RecordingDuration',
#'RecordingType', 'SEEGChannelCount', 'SamplingFrequency',
#'SoftwareFilters', 'SoftwareVersions', 'SubjectArtefactDescription',
#'TaskDescription', 'TaskName', 'TriggerChannelCount', 'Description',
#'RawSources', 'Authors', 'BaselineCorrection', 'BaselineCorrectionMethod',
#Value of Dewar position: 'upright'

Retrieving matching filenames

The layout.get() function allows for more complex queries and can return a list of files matching the query.

file_paths = layout.get(suffix='bold', subject='02', return_type='filename')
print("BOLD files of subject 2:", *file_paths, sep='\n')
#BOLD files of subject 2:

You can also specify lists of search items like subject=['02','03'] in the above statement. This will retrieve all the BOLD files of subjects 02 and 03.

file_paths = layout.get(suffix='meg', subject='009', return_type='filename')
print("MEG files of subject 009:", *file_paths, sep='\n')
#MEG files of subject 009:

The get() function can simultaneously search for matches in the following fields:

  1. scope: The BIDS subdirectory to be searched. Can be any of ‘raw’ | ‘derivatives’

  2. entities: Key-value pairs in the filenames are entities defined in BIDS. Examples are ‘sub’ or ‘run’. Use layout.get_entities() to get a list of entities available in the dataset.

  3. suffix: Suffixes define the imaging modality or data type. Examples are ‘bold’ or ‘meg’ but also ‘events’ or ‘participants’

  4. extension: Is the file extensions. Examples are ‘.nii’ or ‘nii.gz’ for MRI and ‘.fif’ for MEG

  5. return_type: Defines the what get() returns. This can be ‘filename’ or ‘dict’, where ‘dict’ is the default.

bold_files = layout.get(scope='raw',
                    run=["01", "02"])
print(*bold_files, sep='\n')
meg_timeseries_files = layout.get(scope='raw',
print(*meg_timeseries, sep='\n')