Usage

Run options

To start the feature extraction process, make sure you followed the manual installation procedure. Then run polarityjam on the commandline to look at the available run modes. There are 3 options to start the feature extraction process run, run_stack, and run_key which are summarized in the table below.

Mode

Arguments

Description

run

  • paramfile.yml

  • input.tif

  • outputpath

Should be used when a single image needs to be processed.

run_stack

  • paramfile.yml

  • inputpath

  • outputpath

Should be used when a set of images in a folder needs to be processed

run_key

  • paramfile.yml

  • inputpath

  • inputkey.csv

  • outputpath

Should be used when the images that need to be processed have a complex folder structure with multiple sub-folders that need to be excluded from the analysis

The following provides examples of how to run the feature extraction process using the polariyjam command line tool:

# Run a single image
polarityjam run paramfile.yml input.tif outputpath

# Run a stack of images
polarityjam run-stack paramfile.yml inputpath outputpath

# Run a set of images with a complex folder structure
polarityjam run-key paramfile.yml inputpath inputkey.csv outputpath

Parameter file

Most important argument to provide for all modes is the parameter.yml file. In this .yml file format, all options can be specified how the feature extraction pipeline treats the data and what extraction steps to perform. You might want to look at this example parameter file.

The following tables list and describe all options that are available for executing the pipeline. Although they are separated in four different topics, they can be defined in a single parameter.yml file.

Image Parameter

Parameter

Category

Type

Default

Options

Description

channel_junction

image

integer

-1,0,1,2

Specifies which channel in the input image(s) holds information about the junction signals. -1 to indicate there is no channel.

channel_nucleus

image

integer

-1,0,1,2

Specifies which channel in the input image(s) holds information about the nucleus. -1 to indicate there is no channel.

channel_organelle

image

integer

-1,0,1,2

Specifies which channel in the input image(s) holds information about the organelle (e.g golgi apparatus). -1 to indicate there is no channel.

channel_expression_marker

image

integer

-1,0,1,2

Specifies which channel in the input image(s) holds information about the expression marker. -1 to indicate there is no channel.

pixel_to_micron_ratio

image

float

1

Specifies the pixel to micron ratio. E.g. a pixel is worth how many micro meter. Default is 1.

Cellpose Segmentation Parameter

Parameter

Category

Type

Default

Options

Description

manually_annotated_mask

segmentation

string

PolarityJaM looks for an available segmentation in the input path. This parameter specifies the suffix for manually annotated masks. Leave empty to use the suffix “_seg.npy” (cellpose default).

store_segmentation

segmentation

bool

False

True, False

If true, stores the cellpose segmentation masks in the input path (CAUTION: not in the output path!).

use_given_mask

segmentation

bool

True

True, False

Indicated whether to use the masks in the input path (if any) or not. Default is true.

model_type

segmentation

“custom”, <model type>

“cyto”

The model type supported by your segmentation algorithm. For cellpose “cyto” “cyto2”, “custom” is possible. If “custom” is chosen, “cp_model_path” must be set.

model_path

segmentation

string

“”

The Path to the custom model for your segmentation algorithm. Only works in combination with “cp_model_type”.

estimated_cell_diameter

segmentation

integer

100

0 - inf

The estimated cell diameter of the cells in your input image(s). Default 100 pixels.

estimated_nucleus_diameter

segmentation

integer

30

0 - inf

The estimated diameter of the nuclei in your input image(s). Default 30 pixels.

flow_threshold

segmentation

float

0.4

Increase this threshold if cellpose is not returning as many ROIs as you would expect. Similarly, decrease this threshold if cellpose is returning too many ill-shaped ROIs.

cellprob_threshold

segmentation

float

0.0

Decrease this threshold if cellpose is not returning as many ROIs as you’d expect. Increase this threshold if cellpose is returning too many ROIs particularly from dim areas.

use_gpu

segmentation

bool

False

True, False

Indicates whether to use the GPU for faster segmentation. Default is false

channel_cell_segmentation

segmentation

string

“channel_junction”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the cell segmentation. Default is to “channel_junction”.

channel_nuclei_segmentation

segmentation

string

“channel_nucleus”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the nuclei segmentation. Default is to “channel_nucleus”.

DeepCell Segmentation Parameter

Parameter

Category

Type

Default

Options

Description

segmentation_mode

segmentation

string

“whole-cell”

“whole-cell”, “nuclear”

Determines the segmentation mode. Either “whole-cell” or “nuclear”.

save_mask

segmentation

bool

True

True, False

Stores masks on disk in numpy format.

maxima_threshold

segmentation

float

0.18

0 - inf

To finetune specific and consistent errors in your data, this argument can be used during postprocessing. Lower values will result in more cells being detected. Higher values will result in fewer cells being detected.

maxima_smooth

segmentation

float

0.1

0 - inf

Controls what the model considers a unique cell. Lower values will result in more separate cells being predicted, whereas higher values will result in fewer cells.

interior_threshold

segmentation

float

0.1

0 - inf

Controls how conservative the model is in estimating what is a cell vs what is background. Lower values will result in larger cells, whereas higher values will result in smaller smalls.

small_objects_threshold

segmentation

integer

25

0 - inf

Minimal volume size in pixel before an object is detected as such.

fill_holes_threshold

segmentation

integer

5

0 - inf

Filling any holes that are contained in the predicted object up to a certain size.

pixel_expansion

segmentation

integer

0

0 - inf

Expands the predicted object by a certain number of pixels.

channel_cell_segmentation

segmentation

string

“channel_junction”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the cell segmentation. Default is to “channel_junction”.

channel_nuclei_segmentation

segmentation

string

“channel_nucleus”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the nuclei segmentation. Default is to “channel_nucleus”.

Segment Anything Segmentation Parameter

Parameter

Category

Type

Default

Options

Description

model_url

segmentation

url

https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth” “https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth” “https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth” any other SAM provided link

URL where to retrieve the model weights. Please look at segmentanything for curated list! Weights will be downloaded only once!

model_name

segmentation

string

“sam_vit_h”

“sam_vit_h”, “sam_vit_l”, “sam_vit_b”

Name of the model to use. Please look at segmentanything for curated list!

channel_cell_segmentation

segmentation

string

“channel_junction”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the cell segmentation. Default is to “channel_junction”

channel_nuclei_segmentation

segmentation

string

“channel_nucleus”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the nuclei segmentation. Default is to “channel_nucleus”.

channel_organelle_segmentation

segmentation

string

“channel_organelle”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the organelle segmentation. Default is to “channel_organelle”.

MicroSAM Segmentation Parameter

Parameter

Category

Type

Default

Options

Description

model_name

segmentation

string

“sam_vit_h”

“sam_vit_h”, “sam_vit_l”, “sam_vit_b”

Name of the model to use. See MicroSam for information.

checkpoint_path

segmentation

string

“”

“”

Path to the checkpoint file.

embedding_path

segmentation

string

“”

“”

Path to the embedding file.

pred_iou_thresh

segmentation

float

0.8

0 - 1

Threshold for the predicted IoU.

channel_cell_segmentation

segmentation

string

“channel_junction”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the cell segmentation. Default is to “channel_junction”

channel_nuclei_segmentation

segmentation

string

“channel_nucleus”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the nuclei segmentation. Default is to “channel_nucleus”.

channel_organelle_segmentation

segmentation

string

“channel_organelle”

“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”

Specifies which channel in the input image(s) should be used to perform the organelle segmentation. Default is to “channel_organelle”.

Runtime Parameter

Parameter

Category

Type

Default

Options

Description

extract_group_features

runtime

bool

False

True, False

If true, extracts group features based on a feature of interest.

membrane_thickness

runtime

integer

5

0 - inf

Expected membrane thickness.

junction_threshold

runtime

float

-1

0 - inf

Parameter for the junction intensity mask thresholding. If not set value is automatically detected via otsu thresholding.

feature_of_interest

runtime

string

“area”

Name of the feature for which a neighborhood statistics should be calculated. Any feature can be used here. Look at the features to see all available options.

min_cell_size

runtime

integer

50

0 - inf

Minimal expected cell size in pixel. Threshold value for the analysis. Cells with a smaller value will be excluded from the analysis.

min_nucleus_size

runtime

integer

10

0 - inf

The minimal diameter of the nucleus size. Threshold value for the analysis. Cells with a nucleus with a smaller value will be excluded from the analysis.

min_organelle_size

runtime

integer

10

0 - inf

The minimal diameter of the organelle. Threshold value for the analysis. Cells with an organelle with a smaller value will be excluded from the analysis.

dp_epsilon

runtime

integer

5

0 - inf

Parameter for the edge detection algorithm. The higher the value, the less edges are detected and vice versa.

cue_direction

runtime

integer

0

0 - 359

Determines the cue direction (e.g. flow) for your image in degree. 0° corresponds to a cue from left to right. 90° from top to bottom.

connection_graph

runtime

bool

True

True, False

Whether to use a connection graph to model cells or not.

segmentation_algorithm

runtime

string

“CellposeSegmenter”

The segmentation algorithm to use. Choose between “CellposeSegmenter” and “SamSegmenter”. Note that segmentation parameters are different for each algorithm!

clear_border

runtime

bool

True

True, False

If true, removes any segmentation that is not complete because the cell protrude beyond the edge of the image.

remove_small_objects_size

runtime

integer

10

0 - inf

Minimal expected object size in pixel. Segmentation objects with a smaller value will be removed before the analysis starts.

keyfile_condition_cols

runtime

list

[“short_name”]

Only required if the run_key option is used. List of columns transferred to the result table, first entry serves as unique identifier of conditions.

save_sc_images

runtime

bool

False

True, False

If true, saves the closeup single cell images in the output path.

Plot Parameter

Parameter

Category

Type

Default

Options

Description

plot_junctions

plot

bool

True

True, False

Indicates whether to perform the junction polarity plot.

plot_polarity

plot

bool

True

True, False

Indicates whether to perform the organelle polarity plot.

plot_elongation

plot

bool

True

True, False

Indicates whether to perform the elongation plot.

plot_circularity

plot

bool

True

True, False

Indicates whether to perform plot of cell (and nuclei) circularity.

plot_marker

plot

bool

True

True, False

Indicates whether to perform the marker polarity plot.

plot_ratio_method

plot

bool

False

currently disabled

Indicates whether to perform the ratio plot.

plot_shape_orientation

plot

bool

True

True, False

Indicates whether to perform the shape orientation plot.

plot_foi

plot

bool

True

True, False

Indicates whether to perform the feature of interest plot.

plot_sc_images

plot

bool

True

True, False

Indicates whether to perform the closeup single cell images plot.

plot_threshold_masks

plot

bool

True

True, False

Indicates whether to perform the threshold masks plot.

plot_sc_partitions

plot

bool

True

True, False

Indicates whether to plot individual partitioned cells in closeup.

show_scalebar

plot

bool

True

True, False

Shows the scalebar with the pixel to micron ratio specified with the image.

show_statistics

plot

bool

True

True, False

Add circular statistics to plot title.

show_polarity_angles

plot

bool

True

True, False

Indicates whether to additionally add the polarity angles to the polarity plots.

show_graphics_axis

plot

bool

False

True, False

Additionally shows the axes of the image.

length_scalebar_microns

plot

float

10

0 - inf

Length of the scalebar in microns.

outline_width

plot

integer

2

0 - inf

Outline width of a cell.

graphics_output_format

plot

string

“png”, “pdf”

“png”, “pdf” , “svg”

The output format of the plot figures. Several can be specified. Default is png and pdf.

dpi

plot

integer

300

50 - 1200

Resolution of the plots. Specifies the dots per inch.

graphics_width

plot

integer

5

1 - 15

The width of the output plot figures in inches.

graphics_height

plot

integer

5

1 - 15

The width of the output plot figures in inches.

fontsize_text_annotations

plot

integer

6

1 - inf

Fontsize of the text annotations.

font_color

plot

string

“w”

matplotlib colors

Color of the text annotations.

marker_size

plot

integer

2

1 - inf

Size of the markers in the plot.

alpha

plot

float

0.5

0 - 1

Transparency of the overlay masks in the plot.

alpha_cell_outline

plot

float

0.5

0 - 1

Transparency of the cell outline in the plot.

Key file

Often, analysts are challenged not only with the problem of actually performing the analysis, but also with the problem of how and where to store the data. Iterative acquisition of images as well as various experimental settings sometimes require complex folder structures and naming schema to organize data. Frequently, researchers face the problem of data being distributed over several physical devices, leaving them with the problem of how to execute a certain tool on a dedicated subset of images. Not often a lot of time is necessary to spend before the analysis is performed. Moreover, performing analysis steps on several experimental conditions often requires repeating the whole pipeline several times to get the desired output. To tackle this problem, polarityjam offers the execution option run_key that accepts a .csv file describing the storage structures and conditions. To still be able to migrate the data without altering the csv, paths are relative to a given root folder (e.g. inputpath).

The structure of the csv is given as follows:

folder_name

short_name

set_1

cond_1

set_2

cond_2

Folder structure will also be created in the provided output path. Specify a short_name different to the folder_name to rename each folder. (e.g. folder set_1 will be named cond_1 in the output path)

To better understand the concept, in the following you see a tree structure of the input and output folders visualized:

input
├── set_1
│   ├── myfile1.tif
│   └── myfile2.tif
└── set_2
    └── myfile3.tif

The corresponding output folder structure would be:

output
├── cond_1
│   ├── myfile1.csv
│   ├── myfile2.csv
│   └── merged_table_cond_1.csv
├── cond_2
│   ├── myfile3.csv
│   └── merged_table_cond_2.csv
├── key_file.csv
├── run_20220610_13-10-10.log
├── run_20220610_13-10-10_param.yml
└── summary_table.csv

Warning

Using OS specific paths in the key-file.csv might hurt reproducibility! (e.g. windows paths are different than unix paths!)

Web app

The R-shiny web app further analyses the results of the feature extraction process in the browser. There are several statistics available whose parameters can be adapted/adjusted during runtime to immediately observe the change in the corresponding visualization. Thus, exploring the data and revealing interesting patterns is heavily facilitated. To get to know more about the statics jump to circular statistics and continue reading or visit the method section.

Testing

We use a testing framework to make sure outcomes are as expected. To run the software with our example data provided in the package use the following command:

polarityjam_test

This will not keep the output on the disk. To look at the output of the tests specify a target folder:

polarityjam_test --target-folder=/tmp/mytarget

We tested our software on:

macOS 12.7.4 (21H1123), Kernel Version: Darwin 21.6.0 ubuntu 22.04.4 LTS, Kernel Version: 6.5.0-1018-azure Windows 10.0.20348 Build 2402 (without plot tests)