glasscut.dataset package

Submodules

glasscut.dataset.generator module

Dataset generation orchestration for multi-slide tiling workflows.

class glasscut.dataset.generator.DatasetGenerator(dataset_id, output_dir, *, tiler, n_workers=4, batch_size=128, save_thumbnails=True, save_masks=True, save_processed_json=True, show_progress=True, verbose=True)[source]

Bases: object

Generate a tile dataset from one or more slide files.

Parameters:
__init__(dataset_id, output_dir, *, tiler, n_workers=4, batch_size=128, save_thumbnails=True, save_masks=True, save_processed_json=True, show_progress=True, verbose=True)[source]

Initialize generator from direct parameters.

Parameters:
  • dataset_id (str) – Dataset identifier.

  • output_dir (str | Path) – Output root directory.

  • tiler (Tiler) – Preconfigured tiler instance used for extraction.

  • n_workers (int, optional) – Number of workers for batched tile extraction. Default is 4.

  • batch_size (int, optional) – Number of tiles per extraction batch. Default is 128.

  • save_thumbnails (bool, optional) – Whether to save slide thumbnail artifacts.

  • save_masks (bool, optional) – Whether to save tissue mask artifacts.

  • save_processed_json (bool, optional) – Whether to save processed.json at dataset root.

  • show_progress (bool, optional) – Whether to display progress bars for slides and tiles.

  • verbose (bool, optional) – Whether to enable info-level logs.

Return type:

None

process_dataset(slide_paths)[source]

Process all provided slides and persist tiles, artifacts, and metadata.

Parameters:

slide_paths (Sequence[str | Path])

Return type:

DatasetMetadata

glasscut.dataset.live module

Live in-memory slide-level dataset utilities.

This module provides a dataset-like interface where each item corresponds to one slide and contains all extracted tiles for that slide, without writing artifacts to disk.

class glasscut.dataset.live.LiveSlideSample(slide_id, slide_name, slide_path, dimensions, mpp, magnifications, tiles)[source]

Bases: object

Container for one in-memory slide sample.

Parameters:
slide_id

Slide identifier in slide_XXX format.

Type:

str

slide_name

Slide basename without extension.

Type:

str

slide_path

Absolute slide path.

Type:

str

dimensions

Level-0 dimensions as (width, height).

Type:

tuple[int, int]

mpp

Microns-per-pixel at level 0.

Type:

float

magnifications

Available magnification values.

Type:

list[float]

tiles

All extracted tiles for this slide, in extraction order.

Type:

list[Tile]

slide_id: str
slide_name: str
slide_path: str
dimensions: tuple[int, int]
mpp: float
magnifications: list[float]
tiles: list[Tile]
__init__(slide_id, slide_name, slide_path, dimensions, mpp, magnifications, tiles)
Parameters:
Return type:

None

class glasscut.dataset.live.LiveSlideDataset(slide_paths, *, tiler, n_workers=4, batch_size=128, use_cucim=True)[source]

Bases: object

Slide-level in-memory dataset.

Each __getitem__ call opens one slide, extracts all tiles in memory using the configured tiler, and returns a LiveSlideSample.

Parameters:
__init__(slide_paths, *, tiler, n_workers=4, batch_size=128, use_cucim=True)[source]
Parameters:
Return type:

None

__len__()[source]

Return number of slides in the live dataset.

Return type:

int

__getitem__(index)[source]

Return all extracted tiles for one slide.

Parameters:

index (int) – Slide index in the dataset.

Return type:

LiveSlideSample

Module contents

Dataset generation module.

class glasscut.dataset.DatasetGenerator(dataset_id, output_dir, *, tiler, n_workers=4, batch_size=128, save_thumbnails=True, save_masks=True, save_processed_json=True, show_progress=True, verbose=True)[source]

Bases: object

Generate a tile dataset from one or more slide files.

Parameters:
__init__(dataset_id, output_dir, *, tiler, n_workers=4, batch_size=128, save_thumbnails=True, save_masks=True, save_processed_json=True, show_progress=True, verbose=True)[source]

Initialize generator from direct parameters.

Parameters:
  • dataset_id (str) – Dataset identifier.

  • output_dir (str | Path) – Output root directory.

  • tiler (Tiler) – Preconfigured tiler instance used for extraction.

  • n_workers (int, optional) – Number of workers for batched tile extraction. Default is 4.

  • batch_size (int, optional) – Number of tiles per extraction batch. Default is 128.

  • save_thumbnails (bool, optional) – Whether to save slide thumbnail artifacts.

  • save_masks (bool, optional) – Whether to save tissue mask artifacts.

  • save_processed_json (bool, optional) – Whether to save processed.json at dataset root.

  • show_progress (bool, optional) – Whether to display progress bars for slides and tiles.

  • verbose (bool, optional) – Whether to enable info-level logs.

Return type:

None

process_dataset(slide_paths)[source]

Process all provided slides and persist tiles, artifacts, and metadata.

Parameters:

slide_paths (Sequence[str | Path])

Return type:

DatasetMetadata

class glasscut.dataset.LiveSlideDataset(slide_paths, *, tiler, n_workers=4, batch_size=128, use_cucim=True)[source]

Bases: object

Slide-level in-memory dataset.

Each __getitem__ call opens one slide, extracts all tiles in memory using the configured tiler, and returns a LiveSlideSample.

Parameters:
__init__(slide_paths, *, tiler, n_workers=4, batch_size=128, use_cucim=True)[source]
Parameters:
Return type:

None

__len__()[source]

Return number of slides in the live dataset.

Return type:

int

__getitem__(index)[source]

Return all extracted tiles for one slide.

Parameters:

index (int) – Slide index in the dataset.

Return type:

LiveSlideSample

class glasscut.dataset.LiveSlideSample(slide_id, slide_name, slide_path, dimensions, mpp, magnifications, tiles)[source]

Bases: object

Container for one in-memory slide sample.

Parameters:
slide_id

Slide identifier in slide_XXX format.

Type:

str

slide_name

Slide basename without extension.

Type:

str

slide_path

Absolute slide path.

Type:

str

dimensions

Level-0 dimensions as (width, height).

Type:

tuple[int, int]

mpp

Microns-per-pixel at level 0.

Type:

float

magnifications

Available magnification values.

Type:

list[float]

tiles

All extracted tiles for this slide, in extraction order.

Type:

list[Tile]

slide_id: str
slide_name: str
slide_path: str
dimensions: tuple[int, int]
mpp: float
magnifications: list[float]
tiles: list[Tile]
__init__(slide_id, slide_name, slide_path, dimensions, mpp, magnifications, tiles)
Parameters:
Return type:

None