Task Components

As we've seen, a Bitfount task is the brain of a project. It specifies what will run on any dataset linked to the project, in what order and over what view of the data. Tasks are written in YAML format, and at a high level, are made up of 3 key components:

Protocol: orchestrates the run and lifecycle
Algorithm(s): the units of work to execute (can be a list)
Data structure: how to select, assign and transform input data

A minimal skeleton might look something like this:

task:
  protocol:
    name: bitfount.ResultsOnly
    arguments: { ... }
  algorithm:
    - name: bitfount.ModelInference
      arguments: { ... }
  data_structure:
    select:
      include:
        - image_path

Protocols

What they are: the task's entry point that orchestrates the algorithms and handles communication between different parties within the task. A given protocol will only be compatible with a certain set of algorithms.
How to specify: each entry takes a name and arguments. Use the prefix bitfount. followed by the protocol name. A full list of protocols can be found here. The arguments may be optional and are used to configure the protocol. Search for the protocol in the API documentation to see its available arguments.
Examples:
- bitfount.InferenceAndCSVReport: runs model inference and writes a CSV report from the results

task:
  protocol:
    name: bitfount.InferenceAndCSVReport
    arguments: { ... }

Algorithms

What they are: the concrete steps executed by the protocol. You can supply a single algorithm or a list; lists run each algorithm in order. Configuring how the output of one algorithm can be fed into the next algorithm is baked into the protocol in which they are used. Therefore, a given algorithm will only be compatible with a certain set of protocols.
How to specify: each entry takes a name and optional arguments. Use the prefix bitfount. followed by the algorithm name. A full list of algorithms can be found here. The arguments may be optional and are used to configure the algorithm. Search for the algorithm in the API documentation to see its available arguments. Algorithms that require a model to be passed in accept a separate model block, see Referencing a model for more information.
Common patterns:
- Model inference (e.g., bitfount.ModelInference, bitfount.HuggingFaceImageClassificationInference)
- Post-processing (e.g., calculations, matching)
- Reporting (e.g., bitfount.CSVReportAlgorithm)

task:
  algorithm:
    - name: bitfount.ModelInference
      arguments: { ... }
      model: { ... } # see "Referencing a model"
    - name: bitfount.CSVReportAlgorithm
      arguments: { ... }

Data Structures

Defines what the data should look like before it is passed to the algorithms in the task.

tip

More information about the data structure arguments can be found here.

note

The data structure is currently only used to define the input data for tasks that use a model.

table_config: optional configuration to select a specific table from the datasource if the datasource has multiple tables.
select: choose columns to include/exclude from the data; include_prefix can be helpful for datasets that have multiple image columns.
assign: map column names to semantic roles (e.g., image_prefix, target).
transform: define dataset/batch/image transforms to apply to the data (e.g., Albumentations pipelines, grayscale handling). Important for tasks that use a model. More information about the transform arguments can be found here.
data_split: optional configuration for defining how to split data into train/validation/test sets.
compatible_datasources: list of dataset types that are compatible with this data structure configuration.
schema_requirements: specify dataset schema requirements level ("empty", "partial", or "full"), or a dictionary mapping requirement levels to specific dataset types. Defaults to "partial".
filter: optional task-level filters to apply at runtime. These filters allow the task initiator to further restrict which data is processed, without modifying the dataset connection. See Task-level filters below for details.

task:
  data_structure:
    compatible_datasources:
      - DICOMOphthalmologySource
      - HeidelbergSource
    schema_requirements: partial
    data_split:
      args:
        shuffle: false
        test_percentage: 100
        validation_percentage: 0
      data_splitter: percentage
    assign:
      image_prefix: Pixel Data
    select:
      include:
        - Columns
        - Rows
      include_prefix: Pixel Data
    transform:
      image:
        - albumentations:
            step: test
            output: true
            transformations:
              - ToTensorV2

Task-level Filters

Task-level filters allow the task initiator to specify data filtering criteria at runtime, without requiring the dataset owner to modify their dataset connection. This is useful when the same dataset needs to be queried with different criteria across different task runs.

info

Task-level filters are applied in addition to any dataset-level filters configured by the dataset owner at connection time. The resulting filter is the intersection of both—meaning task-level filters can only further restrict the data, never expand it beyond what the dataset owner has allowed.

Filters are specified as a list of filter objects, each containing a filter_type and value:

task:
  data_structure:
    filter:
      - filter_type: modality
        value: OCT
      - filter_type: min-frames
        value: 50
      - filter_type: scan-acquisition-min-date
        value:
          year: 2020
          month: 1

Available filter types

Filter Type	Value Type	Description
`modality`	`"OCT"` or `"SLO"`	Filter by imaging modality
`min-frames`	integer	Minimum number of B-scan frames
`max-frames`	integer	Maximum number of B-scan frames
`min-file-size`	number (MB)	Minimum file size in megabytes
`max-file-size`	number (MB)	Maximum file size in megabytes
`file-creation-min-date`	date object	Earliest file creation date
`file-creation-max-date`	date object	Latest file creation date
`file-modification-min-date`	date object	Earliest file modification date
`file-modification-max-date`	date object	Latest file modification date
`min-dob`	date object	Minimum patient date of birth
`max-dob`	date object	Maximum patient date of birth
`scan-acquisition-min-date`	date object	Earliest scan acquisition date
`scan-acquisition-max-date`	date object	Latest scan acquisition date
`check-required-fields`	list of strings	Required DICOM fields that must be present
`series-description`	string	Filter by DICOM series description

tip

Date values are specified as objects with year (required), and optional month and day fields:

value:
  year: 2023
  month: 6
  day: 15

Protocols​

Algorithms​

Data Structures​

Task-level Filters​

Available filter types​

Protocols

Algorithms

Data Structures

Task-level Filters

Available filter types