cellmil.utils.dataset_from_dataset

Functions

create_processed_dataset_files(root, label, ...)

Create processed dataset files directly compatible with GNNMILDataset.

cellmil.utils.dataset_from_dataset.create_processed_dataset_files(root: Union[str, Path], label: str, pyg_datasets: List[CellGNNMILDataset], data: DataFrame, split: Literal['train', 'val', 'test'] = 'train', force_reload: bool = False) str[source]

Create processed dataset files directly compatible with GNNMILDataset.

This function reuses existing processed datasets but with different labels, and creates the processed files exactly as GNNMILDataset would create them. After running this function, you can use GNNMILDataset normally with the new label.

Parameters:
  • root – Root directory where the processed dataset files will be saved

  • label – New label column name for classification

  • pyg_datasets – List of existing GNNMILDatasets [train, val, test]

  • data – DataFrame containing metadata with the new labels

  • split – Dataset split to create (train/val/test)

  • force_reload – Whether to force reprocessing even if processed files exist

Returns:

Path to the created processed file