Overview¶

What is Multiple Instance Learning?¶

Multiple Instance Learning (MIL) is a machine learning paradigm where:

Each slide is a “bag” containing many instances (cells)
You only have labels at the bag level (e.g., “cancer type” or “survival time”)
The model learns to aggregate item-level information to make bag-level predictions

This is perfect for digital pathology because pathologists diagnose slides.

The CellMIL training pipeline works with data that’s already been processed through earlier steps:

Your job is to:

Predict one of two classes (e.g., cancer subtype A vs B).

Label format: Single column with 0/1 values

dataset = MILDataset(
    label="HISTOLOGY",  # Column with 0/1 labels
    # ...
)

Predict time-to-event outcomes (e.g., overall survival).

Label format: Tuple of (duration_column, event_column)

dataset = MILDataset(
    label=("OS_MONTHS", "OS_EVENT"),  # Duration and event status
    # ...
)