cellmil.visualization¶
- class cellmil.visualization.FeatureVisualizer(config: FeatureVisualizerConfig)[source]¶
Bases:
object- __init__(config: FeatureVisualizerConfig)[source]¶
- static _sample_data(features: ndarray[Any, Any], labels: numpy.ndarray[Any, Any] | None, n_samples: int) tuple[numpy.ndarray[Any, Any], numpy.ndarray[Any, Any] | None, str][source]¶
Sample data if it exceeds n_samples. Returns: (sampled_features, sampled_labels, sample_info_string)
- static _validate_positive_int(value: int | None, default: int) int[source]¶
Validate and return a positive integer, or default if invalid.
- static _adjust_perplexity(n_samples: int, requested_perplexity: int) int[source]¶
Adjust perplexity to be valid for the given number of samples.
- static _create_error_message(title: str, message: str, style_key: str = 'error') Div[source]¶
Create a standardized error message component.
Create a standardized message for when cell type data is unavailable.
- _build_path_from_values(*selected_values: str | None) List[str][source]¶
Build path list from selected dropdown values.
- _standardize_and_fit_pca(features: ndarray[Any, Any], n_components: int = 2) tuple[numpy.ndarray[Any, Any], sklearn.decomposition._pca.PCA][source]¶
Standardize features and fit PCA.
- _standardize_and_fit_tsne(features: ndarray[Any, Any], perplexity: int) ndarray[Any, Any][source]¶
Standardize features and fit t-SNE.
- _create_scatter_by_labels(coordinates: ndarray[Any, Any], labels: ndarray[Any, Any], label_names: Dict[int, str], title: str, xlabel: str, ylabel: str, sample_info: str = '') Figure[source]¶
Create a scatter plot colored by labels (cell types or slides). Reduces duplication across PCA/t-SNE by cell type methods.
- _create_js_divergence_table_component(js_df: DataFrame, reference_cell_type_name: str, is_combined: bool = False) Div[source]¶
Create a standardized JS divergence table component. Reduces duplication between single slide and combined dataset views.
- _get_available_slides() List[str][source]¶
Get list of available slide folders in the dataset directory.
- _explore_directory(path: Path, current_path_parts: Optional[List[str]] = None) Dict[str, Any][source]¶
Recursively explore directory structure to find features.pt files. Returns a nested dictionary structure representing the directory tree.
- _get_available_options_at_level(structure: Dict[str, Any], path_parts: List[str]) List[str][source]¶
Get available options at a specific level in the directory structure.
- _can_load_features(structure: Dict[str, Any], path_parts: List[str]) bool[source]¶
Check if we can load features at the current path.
- _get_features_path(structure: Dict[str, Any], path_parts: List[str]) str[source]¶
Get the full path to the features.pt file for the given path parts.
- _load_features(slide_name: str, path_parts: List[str])[source]¶
Load features for the specified slide and path parts.
- _prepare_data(slide_name: str, path_parts: List[str]) Dict[str, Any][source]¶
Prepare data for visualization by loading features and converting to DataFrame.
- _load_cell_types(slide_name: str, path_parts: List[str]) Optional[Dict[int, int]][source]¶
Load cell types for the specified slide and path parts. Returns a dictionary mapping cell_id to cell_type.
- _prepare_data_with_cell_types(slide_name: str, path_parts: List[str]) Dict[str, Any][source]¶
Prepare data with cell types for visualization.
- _prepare_combined_data(slides: List[str], path_parts: List[str], max_samples_per_slide: int | None = 1000) Dict[str, Any][source]¶
Prepare combined data from multiple slides for dataset-wide analysis. Samples up to max_samples_per_slide from each slide. If max_samples_per_slide is None, use all cells from each slide.
- _prepare_combined_data_with_cell_types(slides: List[str], path_parts: List[str], max_samples_per_slide: int | None = 1000) Dict[str, Any][source]¶
Prepare combined data with cell types from multiple slides. If max_samples_per_slide is None, use all cells from each slide.
- _calculate_first_order_stats(data: ndarray[Any, Any]) Dict[str, Any][source]¶
Calculate first-order statistics for features.
- _create_correlation_matrix(df: DataFrame, feature_names: List[str]) Figure[source]¶
Create correlation matrix heatmap for features.
- _create_distribution_plot(df: DataFrame, feature_name: str) Figure[source]¶
Create distribution plot for a specific feature.
- _create_pca_plot(features: ndarray[Any, Any], feature_names: List[str], n_samples: int = 1000) Figure[source]¶
Create PCA visualization.
- _create_tsne_plot(features: ndarray[Any, Any], n_samples: int = 1000, perplexity: int = 30) Figure[source]¶
Create t-SNE visualization.
- _create_stats_table(stats_dict: Dict[str, Any], feature_names: List[str]) Figure[source]¶
Create a table with first-order statistics.
- _create_distribution_comparison_plot(df: DataFrame, feature_name: str, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) Figure[source]¶
Create overlaid distribution plots for different cell types with normalized densities and KDE curves.
- _calculate_js_divergence_table(df: DataFrame, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], reference_cell_type: int) DataFrame[source]¶
Calculate Jensen-Shannon divergence between reference cell type and all other types for each feature in the dataframe.
Returns a DataFrame where: - Rows are features - Columns are cell types (excluding reference type) - Values are JS divergence scores
- _create_pca_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], n_samples: int = 1000) Figure[source]¶
Create PCA visualization colored by cell type.
- _create_tsne_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], n_samples: int = 1000, perplexity: int = 30) Figure[source]¶
Create t-SNE visualization colored by cell type.
- _create_combined_pca_plot(features: ndarray[Any, Any], slide_labels: ndarray[Any, Any], slides: List[str]) Figure[source]¶
Create PCA visualization colored by slide for combined dataset.
- _create_combined_tsne_plot(features: ndarray[Any, Any], slide_labels: ndarray[Any, Any], slides: List[str], perplexity: int = 30) Figure[source]¶
Create t-SNE visualization colored by slide for combined dataset.
- _create_combined_cell_type_distribution(cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], slide_labels: ndarray[Any, Any], slides: List[str]) Figure[source]¶
Create stacked bar chart showing cell type distribution across slides.
- _create_combined_distribution_comparison(df: DataFrame, feature_name: str, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) Figure[source]¶
Create distribution comparison across all cell types for combined dataset.
- _create_combined_pca_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) Figure[source]¶
Create PCA visualization colored by cell type for combined dataset.
Modules