cellmil.visualization¶

class cellmil.visualization.FeatureVisualizer(config: FeatureVisualizerConfig)[source]¶

Bases: object

__init__(config: FeatureVisualizerConfig)[source]¶

static _to_numpy(data: Any) → ndarray[Any, Any][source]¶: Convert various data types to numpy array.

static _sample_data(features: ndarray[Any, Any], labels: numpy.ndarray[Any, Any] | None, n_samples: int) → tuple[numpy.ndarray[Any, Any], numpy.ndarray[Any, Any] | None, str][source]¶: Sample data if it exceeds n_samples. Returns: (sampled_features, sampled_labels, sample_info_string)

static _validate_positive_int(value: int | None, default: int) → int[source]¶: Validate and return a positive integer, or default if invalid.

static _adjust_perplexity(n_samples: int, requested_perplexity: int) → int[source]¶: Adjust perplexity to be valid for the given number of samples.

static _create_error_message(title: str, message: str, style_key: str = 'error') → Div[source]¶: Create a standardized error message component.

static _create_cell_type_unavailable_message() → Div[source]¶: Create a standardized message for when cell type data is unavailable.

_build_path_from_values(*selected_values: str | None) → List[str][source]¶: Build path list from selected dropdown values.

_standardize_and_fit_pca(features: ndarray[Any, Any], n_components: int = 2) → tuple[numpy.ndarray[Any, Any], sklearn.decomposition._pca.PCA][source]¶: Standardize features and fit PCA.

_standardize_and_fit_tsne(features: ndarray[Any, Any], perplexity: int) → ndarray[Any, Any][source]¶: Standardize features and fit t-SNE.

_create_scatter_by_labels(coordinates: ndarray[Any, Any], labels: ndarray[Any, Any], label_names: Dict[int, str], title: str, xlabel: str, ylabel: str, sample_info: str = '') → Figure[source]¶: Create a scatter plot colored by labels (cell types or slides). Reduces duplication across PCA/t-SNE by cell type methods.

_create_js_divergence_table_component(js_df: DataFrame, reference_cell_type_name: str, is_combined: bool = False) → Div[source]¶: Create a standardized JS divergence table component. Reduces duplication between single slide and combined dataset views.

_get_available_slides() → List[str][source]¶: Get list of available slide folders in the dataset directory.

_explore_directory(path: Path, current_path_parts: Optional[List[str]] = None) → Dict[str, Any][source]¶: Recursively explore directory structure to find features.pt files. Returns a nested dictionary structure representing the directory tree.

_get_available_options_at_level(structure: Dict[str, Any], path_parts: List[str]) → List[str][source]¶: Get available options at a specific level in the directory structure.

_can_load_features(structure: Dict[str, Any], path_parts: List[str]) → bool[source]¶: Check if we can load features at the current path.

_get_features_path(structure: Dict[str, Any], path_parts: List[str]) → str[source]¶: Get the full path to the features.pt file for the given path parts.

_load_features(slide_name: str, path_parts: List[str])[source]¶: Load features for the specified slide and path parts.

_prepare_data(slide_name: str, path_parts: List[str]) → Dict[str, Any][source]¶: Prepare data for visualization by loading features and converting to DataFrame.

_load_cell_types(slide_name: str, path_parts: List[str]) → Optional[Dict[int, int]][source]¶: Load cell types for the specified slide and path parts. Returns a dictionary mapping cell_id to cell_type.

_prepare_data_with_cell_types(slide_name: str, path_parts: List[str]) → Dict[str, Any][source]¶: Prepare data with cell types for visualization.

_prepare_combined_data(slides: List[str], path_parts: List[str], max_samples_per_slide: int | None = 1000) → Dict[str, Any][source]¶: Prepare combined data from multiple slides for dataset-wide analysis. Samples up to max_samples_per_slide from each slide. If max_samples_per_slide is None, use all cells from each slide.

_prepare_combined_data_with_cell_types(slides: List[str], path_parts: List[str], max_samples_per_slide: int | None = 1000) → Dict[str, Any][source]¶: Prepare combined data with cell types from multiple slides. If max_samples_per_slide is None, use all cells from each slide.

_calculate_first_order_stats(data: ndarray[Any, Any]) → Dict[str, Any][source]¶: Calculate first-order statistics for features.

_create_correlation_matrix(df: DataFrame, feature_names: List[str]) → Figure[source]¶: Create correlation matrix heatmap for features.

_create_distribution_plot(df: DataFrame, feature_name: str) → Figure[source]¶: Create distribution plot for a specific feature.

_create_pca_plot(features: ndarray[Any, Any], feature_names: List[str], n_samples: int = 1000) → Figure[source]¶: Create PCA visualization.

_create_tsne_plot(features: ndarray[Any, Any], n_samples: int = 1000, perplexity: int = 30) → Figure[source]¶: Create t-SNE visualization.

_create_stats_table(stats_dict: Dict[str, Any], feature_names: List[str]) → Figure[source]¶: Create a table with first-order statistics.

_create_distribution_comparison_plot(df: DataFrame, feature_name: str, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) → Figure[source]¶: Create overlaid distribution plots for different cell types with normalized densities and KDE curves.

_calculate_js_divergence_table(df: DataFrame, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], reference_cell_type: int) → DataFrame[source]¶

Calculate Jensen-Shannon divergence between reference cell type and all other types for each feature in the dataframe.

Returns a DataFrame where: - Rows are features - Columns are cell types (excluding reference type) - Values are JS divergence scores

_create_pca_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], n_samples: int = 1000) → Figure[source]¶: Create PCA visualization colored by cell type.

_create_tsne_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], n_samples: int = 1000, perplexity: int = 30) → Figure[source]¶: Create t-SNE visualization colored by cell type.

_create_combined_pca_plot(features: ndarray[Any, Any], slide_labels: ndarray[Any, Any], slides: List[str]) → Figure[source]¶: Create PCA visualization colored by slide for combined dataset.

_create_combined_tsne_plot(features: ndarray[Any, Any], slide_labels: ndarray[Any, Any], slides: List[str], perplexity: int = 30) → Figure[source]¶: Create t-SNE visualization colored by slide for combined dataset.

_create_combined_cell_type_distribution(cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], slide_labels: ndarray[Any, Any], slides: List[str]) → Figure[source]¶: Create stacked bar chart showing cell type distribution across slides.

_create_combined_distribution_comparison(df: DataFrame, feature_name: str, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) → Figure[source]¶: Create distribution comparison across all cell types for combined dataset.

_create_combined_pca_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) → Figure[source]¶: Create PCA visualization colored by cell type for combined dataset.

_create_combined_tsne_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], perplexity: int = 30) → Figure[source]¶: Create t-SNE visualization colored by cell type for combined dataset.

visualize(host: str = '127.0.0.1', port: int = 8050, debug: bool = True)[source]¶: Launch the Dash web application for feature visualization.

Modules

cellmil.visualization.feature_visualizer