Task Components
As we've seen, a Bitfount task is the brain of a project. It specifies what will run on any dataset linked to the project, in what order and over what view of the data. Tasks are written in YAML format, and at a high level, are made up of 3 key components:
- Protocol: orchestrates the run and lifecycle
- Algorithm(s): the units of work to execute (can be a list)
- Data structure: how to select, assign and transform input data
A minimal skeleton might look something like this:
task:
protocol:
name: bitfount.ResultsOnly
arguments: { ... }
algorithm:
- name: bitfount.ModelInference
arguments: { ... }
data_structure:
select:
include:
- image_path
Protocols
- What they are: the task's entry point that orchestrates the algorithms and handles communication between different parties within the task. A given protocol will only be compatible with a certain set of algorithms.
- How to specify: each entry takes a
nameandarguments. Use the prefixbitfount.followed by the protocol name. A full list of protocols can be found here. Theargumentsmay be optional and are used to configure the protocol. Search for the protocol in the API documentation to see its available arguments. - Examples:
bitfount.InferenceAndCSVReport: runs model inference and writes a CSV report from the results
task:
protocol:
name: bitfount.InferenceAndCSVReport
arguments: { ... }
Algorithms
- What they are: the concrete steps executed by the protocol. You can supply a single algorithm or a list; lists run each algorithm in order. Configuring how the output of one algorithm can be fed into the next algorithm is baked into the protocol in which they are used. Therefore, a given algorithm will only be compatible with a certain set of protocols.
- How to specify: each entry takes a
nameand optionalarguments. Use the prefixbitfount.followed by the algorithm name. A full list of algorithms can be found here. Theargumentsmay be optional and are used to configure the algorithm. Search for the algorithm in the API documentation to see its available arguments. Algorithms that require a model to be passed in accept a separatemodelblock, see Referencing a model for more information. - Common patterns:
- Model inference (e.g.,
bitfount.ModelInference,bitfount.HuggingFaceImageClassificationInference) - Post-processing (e.g., calculations, matching)
- Reporting (e.g.,
bitfount.CSVReportAlgorithm)
- Model inference (e.g.,
task:
algorithm:
- name: bitfount.ModelInference
arguments: { ... }
model: { ... } # see "Referencing a model"
- name: bitfount.CSVReportAlgorithm
arguments: { ... }
Data Structures
Defines what the data should look like before it is passed to the algorithms in the task.
More information about the data structure arguments can be found here.
The data structure is currently only used to define the input data for tasks that use a model.
- table_config: optional configuration to select a specific table from the datasource if the datasource has multiple tables.
- select: choose columns to include/exclude from the data;
include_prefixcan be helpful for datasets that have multiple image columns. - assign: map column names to semantic roles (e.g.,
image_prefix,target). - transform: define dataset/batch/image transforms to apply to the data (e.g., Albumentations pipelines, grayscale handling). Important for tasks that use a model. More information about the transform arguments can be found here.
- data_split: optional configuration for defining how to split data into train/validation/test sets.
- compatible_datasources: list of dataset types that are compatible with this data structure configuration.
- schema_requirements: specify dataset schema requirements level (
"empty","partial", or"full"), or a dictionary mapping requirement levels to specific dataset types. Defaults to"partial". - filter: optional task-level filters to apply at runtime. These filters allow the task initiator to further restrict which data is processed, without modifying the dataset connection. See Task-level filters below for details.
task:
data_structure:
compatible_datasources:
- DICOMOphthalmologySource
- HeidelbergSource
schema_requirements: partial
data_split:
args:
shuffle: false
test_percentage: 100
validation_percentage: 0
data_splitter: percentage
assign:
image_prefix: Pixel Data
select:
include:
- Columns
- Rows
include_prefix: Pixel Data
transform:
image:
- albumentations:
step: test
output: true
transformations:
- ToTensorV2
Task-level Filters
Task-level filters allow the task initiator to specify data filtering criteria at runtime, without requiring the dataset owner to modify their dataset connection. This is useful when the same dataset needs to be queried with different criteria across different task runs.
Task-level filters are applied in addition to any dataset-level filters configured by the dataset owner at connection time. The resulting filter is the intersection of both—meaning task-level filters can only further restrict the data, never expand it beyond what the dataset owner has allowed.
Filters are specified as a list of filter objects, each containing a filter_type and value:
task:
data_structure:
filter:
- filter_type: modality
value: OCT
- filter_type: min-frames
value: 50
- filter_type: scan-acquisition-min-date
value:
year: 2020
month: 1
Available filter types
| Filter Type | Value Type | Description |
|---|---|---|
modality | "OCT" or "SLO" | Filter by imaging modality |
min-frames | integer | Minimum number of B-scan frames |
max-frames | integer | Maximum number of B-scan frames |
min-file-size | number (MB) | Minimum file size in megabytes |
max-file-size | number (MB) | Maximum file size in megabytes |
file-creation-min-date | date object | Earliest file creation date |
file-creation-max-date | date object | Latest file creation date |
file-modification-min-date | date object | Earliest file modification date |
file-modification-max-date | date object | Latest file modification date |
min-dob | date object | Minimum patient date of birth |
max-dob | date object | Maximum patient date of birth |
scan-acquisition-min-date | date object | Earliest scan acquisition date |
scan-acquisition-max-date | date object | Latest scan acquisition date |
check-required-fields | list of strings | Required DICOM fields that must be present |
series-description | string | Filter by DICOM series description |
Date values are specified as objects with year (required), and optional month and day fields:
value:
year: 2023
month: 6
day: 15