cellmil.visualization

class cellmil.visualization.FeatureVisualizer(config: FeatureVisualizerConfig)[source]

Bases: object

__init__(config: FeatureVisualizerConfig)[source]
static _to_numpy(data: Any) ndarray[Any, Any][source]

Convert various data types to numpy array.

static _sample_data(features: ndarray[Any, Any], labels: numpy.ndarray[Any, Any] | None, n_samples: int) tuple[numpy.ndarray[Any, Any], numpy.ndarray[Any, Any] | None, str][source]

Sample data if it exceeds n_samples. Returns: (sampled_features, sampled_labels, sample_info_string)

static _validate_positive_int(value: int | None, default: int) int[source]

Validate and return a positive integer, or default if invalid.

static _adjust_perplexity(n_samples: int, requested_perplexity: int) int[source]

Adjust perplexity to be valid for the given number of samples.

static _create_error_message(title: str, message: str, style_key: str = 'error') Div[source]

Create a standardized error message component.

static _create_cell_type_unavailable_message() Div[source]

Create a standardized message for when cell type data is unavailable.

_build_path_from_values(*selected_values: str | None) List[str][source]

Build path list from selected dropdown values.

_standardize_and_fit_pca(features: ndarray[Any, Any], n_components: int = 2) tuple[numpy.ndarray[Any, Any], sklearn.decomposition._pca.PCA][source]

Standardize features and fit PCA.

_standardize_and_fit_tsne(features: ndarray[Any, Any], perplexity: int) ndarray[Any, Any][source]

Standardize features and fit t-SNE.

_create_scatter_by_labels(coordinates: ndarray[Any, Any], labels: ndarray[Any, Any], label_names: Dict[int, str], title: str, xlabel: str, ylabel: str, sample_info: str = '') Figure[source]

Create a scatter plot colored by labels (cell types or slides). Reduces duplication across PCA/t-SNE by cell type methods.

_create_js_divergence_table_component(js_df: DataFrame, reference_cell_type_name: str, is_combined: bool = False) Div[source]

Create a standardized JS divergence table component. Reduces duplication between single slide and combined dataset views.

_get_available_slides() List[str][source]

Get list of available slide folders in the dataset directory.

_explore_directory(path: Path, current_path_parts: Optional[List[str]] = None) Dict[str, Any][source]

Recursively explore directory structure to find features.pt files. Returns a nested dictionary structure representing the directory tree.

_get_available_options_at_level(structure: Dict[str, Any], path_parts: List[str]) List[str][source]

Get available options at a specific level in the directory structure.

_can_load_features(structure: Dict[str, Any], path_parts: List[str]) bool[source]

Check if we can load features at the current path.

_get_features_path(structure: Dict[str, Any], path_parts: List[str]) str[source]

Get the full path to the features.pt file for the given path parts.

_load_features(slide_name: str, path_parts: List[str])[source]

Load features for the specified slide and path parts.

_prepare_data(slide_name: str, path_parts: List[str]) Dict[str, Any][source]

Prepare data for visualization by loading features and converting to DataFrame.

_load_cell_types(slide_name: str, path_parts: List[str]) Optional[Dict[int, int]][source]

Load cell types for the specified slide and path parts. Returns a dictionary mapping cell_id to cell_type.

_prepare_data_with_cell_types(slide_name: str, path_parts: List[str]) Dict[str, Any][source]

Prepare data with cell types for visualization.

_prepare_combined_data(slides: List[str], path_parts: List[str], max_samples_per_slide: int | None = 1000) Dict[str, Any][source]

Prepare combined data from multiple slides for dataset-wide analysis. Samples up to max_samples_per_slide from each slide. If max_samples_per_slide is None, use all cells from each slide.

_prepare_combined_data_with_cell_types(slides: List[str], path_parts: List[str], max_samples_per_slide: int | None = 1000) Dict[str, Any][source]

Prepare combined data with cell types from multiple slides. If max_samples_per_slide is None, use all cells from each slide.

_calculate_first_order_stats(data: ndarray[Any, Any]) Dict[str, Any][source]

Calculate first-order statistics for features.

_create_correlation_matrix(df: DataFrame, feature_names: List[str]) Figure[source]

Create correlation matrix heatmap for features.

_create_distribution_plot(df: DataFrame, feature_name: str) Figure[source]

Create distribution plot for a specific feature.

_create_pca_plot(features: ndarray[Any, Any], feature_names: List[str], n_samples: int = 1000) Figure[source]

Create PCA visualization.

_create_tsne_plot(features: ndarray[Any, Any], n_samples: int = 1000, perplexity: int = 30) Figure[source]

Create t-SNE visualization.

_create_stats_table(stats_dict: Dict[str, Any], feature_names: List[str]) Figure[source]

Create a table with first-order statistics.

_create_distribution_comparison_plot(df: DataFrame, feature_name: str, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) Figure[source]

Create overlaid distribution plots for different cell types with normalized densities and KDE curves.

_calculate_js_divergence_table(df: DataFrame, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], reference_cell_type: int) DataFrame[source]

Calculate Jensen-Shannon divergence between reference cell type and all other types for each feature in the dataframe.

Returns a DataFrame where: - Rows are features - Columns are cell types (excluding reference type) - Values are JS divergence scores

_create_pca_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], n_samples: int = 1000) Figure[source]

Create PCA visualization colored by cell type.

_create_tsne_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], n_samples: int = 1000, perplexity: int = 30) Figure[source]

Create t-SNE visualization colored by cell type.

_create_combined_pca_plot(features: ndarray[Any, Any], slide_labels: ndarray[Any, Any], slides: List[str]) Figure[source]

Create PCA visualization colored by slide for combined dataset.

_create_combined_tsne_plot(features: ndarray[Any, Any], slide_labels: ndarray[Any, Any], slides: List[str], perplexity: int = 30) Figure[source]

Create t-SNE visualization colored by slide for combined dataset.

_create_combined_cell_type_distribution(cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], slide_labels: ndarray[Any, Any], slides: List[str]) Figure[source]

Create stacked bar chart showing cell type distribution across slides.

_create_combined_distribution_comparison(df: DataFrame, feature_name: str, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) Figure[source]

Create distribution comparison across all cell types for combined dataset.

_create_combined_pca_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str]) Figure[source]

Create PCA visualization colored by cell type for combined dataset.

_create_combined_tsne_by_cell_type(features: numpy.ndarray[Any, Any] | torch.Tensor, cell_types: ndarray[Any, Any], cell_type_names: Dict[int, str], perplexity: int = 30) Figure[source]

Create t-SNE visualization colored by cell type for combined dataset.

visualize(host: str = '127.0.0.1', port: int = 8050, debug: bool = True)[source]

Launch the Dash web application for feature visualization.

Modules