cellmil.data¶
- class cellmil.data.PatchExtractor(config: PatchExtractorConfig)[source]¶
Bases:
objectClass for preparing data from WSI
- __init__(config: PatchExtractorConfig) None[source]¶
- _set_hardware(hardware_selection: str = 'cucim') None[source]¶
Either load CuCIM (GPU-accelerated) or OpenSlide
- Parameters:
hardware_selection (str, optional) – Specify hardware. Just for experiments. Must be either “openslide”, or “cucim”. Defaults to cucim.
- _set_wsi_path(wsi_path: str | pathlib.Path) None[source]¶
Set the path to the WSI file.
- Parameters:
wsi_paths (Union[str, Path, List]) – Path to the folder where all WSI are stored or path to a single WSI-file.
- get_patches() None[source]¶
Main functiuon to create a dataset. Sample the complete dataset.
This function acts as an entrypoint to the patch-processing. When this function is called, all WSI that have been detected are processed. Depending on the selected configuration, either already processed WSI will be removed or newly processed. The processed WSI are stored in the file processed.json in the output-folder.
- _prepare_wsi(wsi_file: Path) tuple[tuple[int, int], tuple[dict[str, Any], dict[str, PIL.Image.Image], dict[str, PIL.Image.Image], dict[str, PIL.Image.Image]], tuple[list[Any], Any, list[shapely.geometry.polygon.Polygon], list[str]]][source]¶
Prepare a WSI for preprocessing
First, some sanity checks are performed and the target level for DeepZoomGenerator is calculated. We are not using OpenSlides default DeepZoomGenerator, but rather one based on the cupy library which is much faster (cf https://github.com/rapidsai/cucim). One core element is to find all patches that are non-background patches. For this, a tissue mask is generated. At this stage, no patches are extracted!
For further documentation (i.e., configuration settings), see the class documentation [link].
- Parameters:
wsi_file (str) – Name of the wsi file
- Raises:
WrongParameterException – The level resulting from target magnification or downsampling factor must exists to extract patches.
- Returns:
Tuple[int, int]: Number of rows, cols of the WSI at the given level
dict: Dictionary with Metadata of the WSI
dict[str, Image]: Masks generated during tissue detection stored in dict with keys equals the mask name and values equals the PIL image
- dict[str, Image]: Annotation masks for provided annotations for the complete WSI. Masks are equal to the tissue masks sizes.
Keys are the mask names and values are the PIL images.
- dict[str, Image]: Thumbnail images with different downsampling and resolutions.
Keys are the thumbnail names and values are the PIL images.
callable: Batch-Processing function performing the actual patch-extraction task
List[List[Tuple]]: Divided List with batches of batch-size. Each batch-element contains the row, col position of a patch together with bg-ratio.
- Return type:
Tuple[Tuple[int, int], Tuple[dict, dict, dict, dict], Callable, List[List[Tuple]]]
- process_queue(batch: List[tuple[int, int, float]], wsi_file: Path, wsi_metadata: dict[str, Any], level: int, polygons: List[Polygon], region_labels: List[str], store: Storage) tuple[int, dict[int, int], list[dict[str, dict[str, Any]]]][source]¶
Extract patches for a list of coordinates by using multiprocessing queues
Patches are extracted according to their coordinate with given patch-settings (size, overlap). Patch annotation masks can be stored, as well as context patches with the same shape retrieved. Optionally, stains can be nornalized according to macenko normalization.
- Parameters:
batch (List[Tuple[int, int, float]]) – A batch of patch coordinates (row, col, backgropund ratio)
wsi_file (Union[Path, str]) – Path to the WSI file from which the patches should be extracted from
wsi_metadata (dict) – Dictionary with important WSI metadata
level (int) – The tile level for sampling.
polygons (List[Polygon]) – Annotations of this WSI as a list of polygons (referenced to highest level of WSI). If no annotations, pass an empty list [].
region_labels (List[str]) – List of labels for the annotations provided as polygons parameter.
annotations (If no) –
[]. (pass an empty list) –
store (Storage) – Storage object passed to each worker to store the files
- Returns:
Number of processed patches
- Return type:
- _drop_processed_files(processed_files: list[str]) None[source]¶
Drop processed file from processed.json file from dataset.
- _check_overwrite(overwrite: bool = False) None[source]¶
Performs data cleanage, depending on overwrite.
If true, overwrites the patches that have already been created in case they already exist. If false, skips already processed files from processed.json in the provided output path (created during class initialization)
- Parameters:
overwrite (bool, optional) – Overwrite flag. Defaults to False.
- static _check_patch_params(patch_size: int, patch_overlap: int, downsample: Optional[int] = None, target_mag: Optional[float] = None, level: Optional[int] = None, min_background_ratio: float = 0.01) None[source]¶
Sanity Check for parameters
See `Raises`section for further comments about the sanity check.
- Parameters:
patch_size (int) – The size of the patches in pixel that will be retrieved from the WSI, e.g. 256 for 256px
patch_overlap (int) – The amount pixels that should overlap between two different patches.
downsample (int, optional) – Downsampling factor from the highest level (largest resolution). Defaults to None.
target_mag (float, optional) – If this parameter is provided, the output level of the wsi
wsi. (corresponds to the level that is at the target magnification of the) –
None. (Alternative to downsaple and level. Defaults to) –
level (int, optional) – The tile level for sampling, alternative to downsample. Defaults to None.
min_background_ratio (float, optional) – Minimum background selection ratio. Defaults to 1.0.
- Raises:
WrongParameterException – Either downsample, level, or target_magnification must have been selected.
WrongParameterException – Downsampling must be a power of two.
WrongParameterException – Negative overlap is not allowed.
WrongParameterException – Overlap should not be larger than half of the patch size.
WrongParameterException – Background Percentage must be between 0 and 1.
Modules