cellmil.datamodels.model

Functions

convert_numpy_types(obj)

Recursively convert numpy types to Python native types for JSON serialization.

Classes

ExperimentMetadata(name, k_folds, ...)

Metadata for the entire k-fold experiment.

FoldMetadata(fold_idx, train_size, val_size, ...)

Metadata for a single fold.

ModelStorage(output_dir, experiment_name[, ...])

Manages storage and retrieval of k-fold cross-validation results.

cellmil.datamodels.model.convert_numpy_types(obj: Any) Any[source]

Recursively convert numpy types to Python native types for JSON serialization.

Parameters:

obj – Object to convert

Returns:

Object with numpy types converted to Python native types

class cellmil.datamodels.model.FoldMetadata(fold_idx: int, train_size: int, val_size: int, best_epoch: int, best_metric_value: float, metric_name: str, is_survival: bool, metrics: dict[str, Any])[source]

Bases: object

Metadata for a single fold.

fold_idx: int
train_size: int
val_size: int
best_epoch: int
best_metric_value: float
metric_name: str
is_survival: bool
metrics: dict[str, Any]
__init__(fold_idx: int, train_size: int, val_size: int, best_epoch: int, best_metric_value: float, metric_name: str, is_survival: bool, metrics: dict[str, Any]) None
class cellmil.datamodels.model.ExperimentMetadata(name: str, k_folds: int, random_state: int, balance_cell_counts: bool, cell_balance_bins: int, is_survival: bool, aggregated_metrics: dict[str, Any], best_fold_idx: int, avg_best_epoch: float, dataset_config: dict[str, Any], model_config: dict[str, Any])[source]

Bases: object

Metadata for the entire k-fold experiment.

name: str
k_folds: int
random_state: int
balance_cell_counts: bool
cell_balance_bins: int
is_survival: bool
aggregated_metrics: dict[str, Any]
best_fold_idx: int
avg_best_epoch: float
dataset_config: dict[str, Any]
model_config: dict[str, Any]
__init__(name: str, k_folds: int, random_state: int, balance_cell_counts: bool, cell_balance_bins: int, is_survival: bool, aggregated_metrics: dict[str, Any], best_fold_idx: int, avg_best_epoch: float, dataset_config: dict[str, Any], model_config: dict[str, Any]) None
class cellmil.datamodels.model.ModelStorage(output_dir: Union[str, Path], experiment_name: str, load_existing: bool = False)[source]

Bases: object

Manages storage and retrieval of k-fold cross-validation results.

Directory structure: {output_dir}/

├── experiment_metadata.json ├── fold_0/ │ ├── best_model.ckpt │ ├── train_indices.json │ ├── val_indices.json │ ├── predictions.csv │ ├── transforms/ │ │ ├── pipeline_config.json │ │ ├── transform_0_*.json │ │ └── … │ ├── label_transforms/ │ │ ├── pipeline.json │ │ ├── transform_0.json │ │ └── … │ └── metadata.json ├── fold_1/ │ └── … ├── … └── final_model/

├── final_model.ckpt └── metadata.json

__init__(output_dir: Union[str, Path], experiment_name: str, load_existing: bool = False)[source]

Initialize ModelStorage.

Parameters:
  • output_dir – Base directory for storing results

  • experiment_name – Name of the experiment

  • load_existing – If True, load from existing directory without versioning

save_fold_results(fold_idx: int, checkpoint_path: Union[str, Path], train_indices: list[int], val_indices: list[int], predictions: dict[str, Any], metadata: FoldMetadata, transforms: Optional[Any] = None, label_transforms: Optional[Any] = None) None[source]

Save all results for a single fold.

Parameters:
  • fold_idx – Fold index

  • checkpoint_path – Path to the best checkpoint for this fold

  • train_indices – Training indices

  • val_indices – Validation indices

  • predictions – Dictionary with ‘y_true’ and ‘y_pred’ arrays

  • metadata – Fold metadata

  • transforms – Optional feature transforms

  • label_transforms – Optional label transforms

save_experiment_metadata(metadata: ExperimentMetadata) None[source]

Save overall experiment metadata.

save_final_model(checkpoint_path: Union[str, Path], avg_epochs: float, final_metrics: dict[str, Any], transforms: Optional[Any] = None, label_transforms: Optional[Any] = None) None[source]

Save the final model trained on average epochs.

Parameters:
  • checkpoint_path – Path to final model checkpoint

  • avg_epochs – Average number of epochs used

  • final_metrics – Metrics from final model

  • transforms – Optional feature transforms

  • label_transforms – Optional label transforms

load_fold_checkpoint(fold_idx: int) Path[source]

Load checkpoint path for a specific fold.

load_final_checkpoint() Path[source]

Load the final model checkpoint.

load_fold_predictions(fold_idx: int) DataFrame[source]

Load predictions for a specific fold.

load_all_predictions() DataFrame[source]

Load and concatenate predictions from all folds.

load_fold_transforms(fold_idx: int) tuple[Any, Any][source]

Load transforms for a specific fold.

load_final_transforms() tuple[Any, Any][source]

Load transforms for the final model.

get_average_best_epoch() float[source]

Calculate average of best epochs across all folds.

get_experiment_summary() dict[str, Any][source]

Get a summary of the entire experiment.

get_fold_indices(fold_idx: int) tuple[list[int], list[int]][source]

Get train and validation indices for a specific fold.

classmethod from_directory(experiment_dir: Union[str, Path]) ModelStorage[source]

Load an existing experiment from a directory.

Parameters:

experiment_dir – Path to the experiment directory

Returns:

ModelStorage instance with loaded metadata

Example

>>> storage = ModelStorage.from_directory("/path/to/experiments/my_experiment")
>>> print(storage.experiment_metadata)
>>> predictions = storage.load_all_predictions()
_load_experiment_metadata() None[source]

Load experiment metadata from disk.

_load_fold_metadata(fold_idx: int) None[source]

Load metadata for a specific fold.

_load_all_metadata() None[source]

Load all experiment and fold metadata from disk.

list_folds() list[int][source]

Get list of available fold indices.

has_final_model() bool[source]

Check if a final model exists.