cellmil.datamodels.model¶

Functions

convert_numpy_types(obj)

Recursively convert numpy types to Python native types for JSON serialization.

Classes

`ExperimentMetadata`(name, k_folds, ...)	Metadata for the entire k-fold experiment.
`FoldMetadata`(fold_idx, train_size, val_size, ...)	Metadata for a single fold.
`ModelStorage`(output_dir, experiment_name[, ...])	Manages storage and retrieval of k-fold cross-validation results.

cellmil.datamodels.model.convert_numpy_types(obj: Any) → Any[source]¶

Recursively convert numpy types to Python native types for JSON serialization.

Parameters:: obj – Object to convert
Returns:: Object with numpy types converted to Python native types

class cellmil.datamodels.model.FoldMetadata(fold_idx: int, train_size: int, val_size: int, best_epoch: int, best_metric_value: float, metric_name: str, is_survival: bool, metrics: dict[str, Any])[source]¶

Bases: object

Metadata for a single fold.

fold_idx: int¶

train_size: int¶

val_size: int¶

best_epoch: int¶

best_metric_value: float¶

metric_name: str¶

is_survival: bool¶

metrics: dict[str, Any]¶

__init__(fold_idx: int, train_size: int, val_size: int, best_epoch: int, best_metric_value: float, metric_name: str, is_survival: bool, metrics: dict[str, Any]) → None¶

class cellmil.datamodels.model.ExperimentMetadata(name: str, k_folds: int, random_state: int, balance_cell_counts: bool, cell_balance_bins: int, is_survival: bool, aggregated_metrics: dict[str, Any], best_fold_idx: int, avg_best_epoch: float, dataset_config: dict[str, Any], model_config: dict[str, Any])[source]¶

Bases: object

Metadata for the entire k-fold experiment.

name: str¶

k_folds: int¶

random_state: int¶

balance_cell_counts: bool¶

cell_balance_bins: int¶

is_survival: bool¶

aggregated_metrics: dict[str, Any]¶

best_fold_idx: int¶

avg_best_epoch: float¶

dataset_config: dict[str, Any]¶

model_config: dict[str, Any]¶

__init__(name: str, k_folds: int, random_state: int, balance_cell_counts: bool, cell_balance_bins: int, is_survival: bool, aggregated_metrics: dict[str, Any], best_fold_idx: int, avg_best_epoch: float, dataset_config: dict[str, Any], model_config: dict[str, Any]) → None¶

class cellmil.datamodels.model.ModelStorage(output_dir: Union[str, Path], experiment_name: str, load_existing: bool = False)[source]¶

Bases: object

Manages storage and retrieval of k-fold cross-validation results.

Directory structure: {output_dir}/

├── experiment_metadata.json ├── fold_0/ │ ├── best_model.ckpt │ ├── train_indices.json │ ├── val_indices.json │ ├── predictions.csv │ ├── transforms/ │ │ ├── pipeline_config.json │ │ ├── transform_0_*.json │ │ └── … │ ├── label_transforms/ │ │ ├── pipeline.json │ │ ├── transform_0.json │ │ └── … │ └── metadata.json ├── fold_1/ │ └── … ├── … └── final_model/

├── final_model.ckpt └── metadata.json

__init__(output_dir: Union[str, Path], experiment_name: str, load_existing: bool = False)[source]¶

Initialize ModelStorage.

Parameters:

output_dir – Base directory for storing results
experiment_name – Name of the experiment
load_existing – If True, load from existing directory without versioning

save_fold_results(fold_idx: int, checkpoint_path: Union[str, Path], train_indices: list[int], val_indices: list[int], predictions: dict[str, Any], metadata: FoldMetadata, transforms: Optional[Any] = None, label_transforms: Optional[Any] = None) → None[source]¶

Save all results for a single fold.

Parameters:

fold_idx – Fold index
checkpoint_path – Path to the best checkpoint for this fold
train_indices – Training indices
val_indices – Validation indices
predictions – Dictionary with ‘y_true’ and ‘y_pred’ arrays
metadata – Fold metadata
transforms – Optional feature transforms
label_transforms – Optional label transforms

save_experiment_metadata(metadata: ExperimentMetadata) → None[source]¶: Save overall experiment metadata.

save_final_model(checkpoint_path: Union[str, Path], avg_epochs: float, final_metrics: dict[str, Any], transforms: Optional[Any] = None, label_transforms: Optional[Any] = None) → None[source]¶

Save the final model trained on average epochs.

Parameters:

checkpoint_path – Path to final model checkpoint
avg_epochs – Average number of epochs used
final_metrics – Metrics from final model
transforms – Optional feature transforms
label_transforms – Optional label transforms

load_fold_checkpoint(fold_idx: int) → Path[source]¶: Load checkpoint path for a specific fold.

load_final_checkpoint() → Path[source]¶: Load the final model checkpoint.

load_fold_predictions(fold_idx: int) → DataFrame[source]¶: Load predictions for a specific fold.

load_all_predictions() → DataFrame[source]¶: Load and concatenate predictions from all folds.

load_fold_transforms(fold_idx: int) → tuple[Any, Any][source]¶: Load transforms for a specific fold.

load_final_transforms() → tuple[Any, Any][source]¶: Load transforms for the final model.

get_average_best_epoch() → float[source]¶: Calculate average of best epochs across all folds.

get_experiment_summary() → dict[str, Any][source]¶: Get a summary of the entire experiment.

get_fold_indices(fold_idx: int) → tuple[list[int], list[int]][source]¶: Get train and validation indices for a specific fold.

classmethod from_directory(experiment_dir: Union[str, Path]) → ModelStorage[source]¶

Load an existing experiment from a directory.

Parameters:: experiment_dir – Path to the experiment directory
Returns:: ModelStorage instance with loaded metadata

Example

>>> storage = ModelStorage.from_directory("/path/to/experiments/my_experiment")
>>> print(storage.experiment_metadata)
>>> predictions = storage.load_all_predictions()

_load_experiment_metadata() → None[source]¶: Load experiment metadata from disk.

_load_fold_metadata(fold_idx: int) → None[source]¶: Load metadata for a specific fold.

_load_all_metadata() → None[source]¶: Load all experiment and fold metadata from disk.

list_folds() → list[int][source]¶: Get list of available fold indices.

has_final_model() → bool[source]¶: Check if a final model exists.