Usage

Run options

To start the feature extraction process, make sure you followed the manual installation procedure. Then run polarityjam on the commandline to look at the available run modes. There are 3 options to start the feature extraction process run, run_stack, and run_key which are summarized in the table below.

Mode	Arguments	Description
run	paramfile.yml input.tif outputpath	Should be used when a single image needs to be processed.
run_stack	paramfile.yml inputpath outputpath	Should be used when a set of images in a folder needs to be processed
run_key	paramfile.yml inputpath inputkey.csv outputpath	Should be used when the images that need to be processed have a complex folder structure with multiple sub-folders that need to be excluded from the analysis

The following provides examples of how to run the feature extraction process using the polariyjam command line tool:

# Run a single image
polarityjam run paramfile.yml input.tif outputpath

# Run a stack of images
polarityjam run-stack paramfile.yml inputpath outputpath

# Run a set of images with a complex folder structure
polarityjam run-key paramfile.yml inputpath inputkey.csv outputpath

Parameter file

Most important argument to provide for all modes is the parameter.yml file. In this .yml file format, all options can be specified how the feature extraction pipeline treats the data and what extraction steps to perform. You might want to look at this example parameter file.

The following tables list and describe all options that are available for executing the pipeline. Although they are separated in four different topics, they can be defined in a single parameter.yml file.

Image Parameter

Parameter	Category	Type	Default	Options	Description

channel_junction	image	integer		-1,0,1,2	Specifies which channel in the input image(s) holds information about the junction signals. -1 to indicate there is no channel.
channel_nucleus	image	integer		-1,0,1,2	Specifies which channel in the input image(s) holds information about the nucleus. -1 to indicate there is no channel.
channel_organelle	image	integer		-1,0,1,2	Specifies which channel in the input image(s) holds information about the organelle (e.g golgi apparatus). -1 to indicate there is no channel.
channel_expression_marker	image	integer		-1,0,1,2	Specifies which channel in the input image(s) holds information about the expression marker. -1 to indicate there is no channel.
pixel_to_micron_ratio	image	float	1		Specifies the pixel to micron ratio. E.g. a pixel is worth how many micro meter. Default is 1.

Cellpose Segmentation Parameter

Parameter	Category	Type	Default	Options	Description

manually_annotated_mask	segmentation	string			PolarityJaM looks for an available segmentation in the input path. This parameter specifies the suffix for manually annotated masks. Leave empty to use the suffix “_seg.npy” (cellpose default).
store_segmentation	segmentation	bool	False	True, False	If true, stores the cellpose segmentation masks in the input path (CAUTION: not in the output path!).
use_given_mask	segmentation	bool	True	True, False	Indicated whether to use the masks in the input path (if any) or not. Default is true.
model_type	segmentation	“custom”, <model type>	“cyto”		The model type supported by your segmentation algorithm. For cellpose “cyto” “cyto2”, “custom” is possible. If “custom” is chosen, “cp_model_path” must be set.
model_path	segmentation	string	“”		The Path to the custom model for your segmentation algorithm. Only works in combination with “cp_model_type”.
estimated_cell_diameter	segmentation	integer	100	0 - inf	The estimated cell diameter of the cells in your input image(s). Default 100 pixels.
estimated_nucleus_diameter	segmentation	integer	30	0 - inf	The estimated diameter of the nuclei in your input image(s). Default 30 pixels.
flow_threshold	segmentation	float	0.4		Increase this threshold if cellpose is not returning as many ROIs as you would expect. Similarly, decrease this threshold if cellpose is returning too many ill-shaped ROIs.
cellprob_threshold	segmentation	float	0.0		Decrease this threshold if cellpose is not returning as many ROIs as you’d expect. Increase this threshold if cellpose is returning too many ROIs particularly from dim areas.
use_gpu	segmentation	bool	False	True, False	Indicates whether to use the GPU for faster segmentation. Default is false
channel_cell_segmentation	segmentation	string	“channel_junction”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the cell segmentation. Default is to “channel_junction”.
channel_nuclei_segmentation	segmentation	string	“channel_nucleus”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the nuclei segmentation. Default is to “channel_nucleus”.

DeepCell Segmentation Parameter

Parameter	Category	Type	Default	Options	Description

segmentation_mode	segmentation	string	“whole-cell”	“whole-cell”, “nuclear”	Determines the segmentation mode. Either “whole-cell” or “nuclear”.
save_mask	segmentation	bool	True	True, False	Stores masks on disk in numpy format.
maxima_threshold	segmentation	float	0.18	0 - inf	To finetune specific and consistent errors in your data, this argument can be used during postprocessing. Lower values will result in more cells being detected. Higher values will result in fewer cells being detected.
maxima_smooth	segmentation	float	0.1	0 - inf	Controls what the model considers a unique cell. Lower values will result in more separate cells being predicted, whereas higher values will result in fewer cells.
interior_threshold	segmentation	float	0.1	0 - inf	Controls how conservative the model is in estimating what is a cell vs what is background. Lower values will result in larger cells, whereas higher values will result in smaller smalls.
small_objects_threshold	segmentation	integer	25	0 - inf	Minimal volume size in pixel before an object is detected as such.
fill_holes_threshold	segmentation	integer	5	0 - inf	Filling any holes that are contained in the predicted object up to a certain size.
pixel_expansion	segmentation	integer	0	0 - inf	Expands the predicted object by a certain number of pixels.
channel_cell_segmentation	segmentation	string	“channel_junction”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the cell segmentation. Default is to “channel_junction”.
channel_nuclei_segmentation	segmentation	string	“channel_nucleus”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the nuclei segmentation. Default is to “channel_nucleus”.

Segment Anything Segmentation Parameter

Parameter	Category	Type	Default	Options	Description

model_url	segmentation	url	“https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth”	“https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth” “https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth” “https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth” any other SAM provided link	URL where to retrieve the model weights. Please look at segmentanything for curated list! Weights will be downloaded only once!
model_name	segmentation	string	“sam_vit_h”	“sam_vit_h”, “sam_vit_l”, “sam_vit_b”	Name of the model to use. Please look at segmentanything for curated list!
channel_cell_segmentation	segmentation	string	“channel_junction”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the cell segmentation. Default is to “channel_junction”
channel_nuclei_segmentation	segmentation	string	“channel_nucleus”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the nuclei segmentation. Default is to “channel_nucleus”.
channel_organelle_segmentation	segmentation	string	“channel_organelle”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the organelle segmentation. Default is to “channel_organelle”.

MicroSAM Segmentation Parameter

Parameter	Category	Type	Default	Options	Description

model_name	segmentation	string	“sam_vit_h”	“sam_vit_h”, “sam_vit_l”, “sam_vit_b”	Name of the model to use. See MicroSam for information.
checkpoint_path	segmentation	string	“”	“”	Path to the checkpoint file.
embedding_path	segmentation	string	“”	“”	Path to the embedding file.
pred_iou_thresh	segmentation	float	0.8	0 - 1	Threshold for the predicted IoU.
channel_cell_segmentation	segmentation	string	“channel_junction”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the cell segmentation. Default is to “channel_junction”
channel_nuclei_segmentation	segmentation	string	“channel_nucleus”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the nuclei segmentation. Default is to “channel_nucleus”.
channel_organelle_segmentation	segmentation	string	“channel_organelle”	“channel_junction” “channel_nucleus” “channel_organelle “channel_expression_marker”	Specifies which channel in the input image(s) should be used to perform the organelle segmentation. Default is to “channel_organelle”.

Runtime Parameter

Parameter	Category	Type	Default	Options	Description

extract_group_features	runtime	bool	False	True, False	If true, extracts group features based on a feature of interest.
membrane_thickness	runtime	integer	5	0 - inf	Expected membrane thickness.
junction_threshold	runtime	float	-1	0 - inf	Parameter for the junction intensity mask thresholding. If not set value is automatically detected via otsu thresholding.
feature_of_interest	runtime	string	“area”		Name of the feature for which a neighborhood statistics should be calculated. Any feature can be used here. Look at the features to see all available options.
min_cell_size	runtime	integer	50	0 - inf	Minimal expected cell size in pixel. Threshold value for the analysis. Cells with a smaller value will be excluded from the analysis.
min_nucleus_size	runtime	integer	10	0 - inf	The minimal diameter of the nucleus size. Threshold value for the analysis. Cells with a nucleus with a smaller value will be excluded from the analysis.
min_organelle_size	runtime	integer	10	0 - inf	The minimal diameter of the organelle. Threshold value for the analysis. Cells with an organelle with a smaller value will be excluded from the analysis.
dp_epsilon	runtime	integer	5	0 - inf	Parameter for the edge detection algorithm. The higher the value, the less edges are detected and vice versa.
cue_direction	runtime	integer	0	0 - 359	Determines the cue direction (e.g. flow) for your image in degree. 0° corresponds to a cue from left to right. 90° from top to bottom.
connection_graph	runtime	bool	True	True, False	Whether to use a connection graph to model cells or not.
segmentation_algorithm	runtime	string	“CellposeSegmenter”		The segmentation algorithm to use. Choose between “CellposeSegmenter” and “SamSegmenter”. Note that segmentation parameters are different for each algorithm!
clear_border	runtime	bool	True	True, False	If true, removes any segmentation that is not complete because the cell protrude beyond the edge of the image.
remove_small_objects_size	runtime	integer	10	0 - inf	Minimal expected object size in pixel. Segmentation objects with a smaller value will be removed before the analysis starts.
keyfile_condition_cols	runtime	list	[“short_name”]		Only required if the run_key option is used. List of columns transferred to the result table, first entry serves as unique identifier of conditions.
save_sc_images	runtime	bool	False	True, False	If true, saves the closeup single cell images in the output path.

Plot Parameter

Parameter	Category	Type	Default	Options	Description
plot_junctions	plot	bool	True	True, False	Indicates whether to perform the junction polarity plot.
plot_polarity	plot	bool	True	True, False	Indicates whether to perform the organelle polarity plot.
plot_elongation	plot	bool	True	True, False	Indicates whether to perform the elongation plot.
plot_circularity	plot	bool	True	True, False	Indicates whether to perform plot of cell (and nuclei) circularity.
plot_marker	plot	bool	True	True, False	Indicates whether to perform the marker polarity plot.
plot_ratio_method	plot	bool	False	currently disabled	Indicates whether to perform the ratio plot.
plot_shape_orientation	plot	bool	True	True, False	Indicates whether to perform the shape orientation plot.
plot_foi	plot	bool	True	True, False	Indicates whether to perform the feature of interest plot.
plot_sc_images	plot	bool	True	True, False	Indicates whether to perform the closeup single cell images plot.
plot_threshold_masks	plot	bool	True	True, False	Indicates whether to perform the threshold masks plot.
plot_sc_partitions	plot	bool	True	True, False	Indicates whether to plot individual partitioned cells in closeup.
show_scalebar	plot	bool	True	True, False	Shows the scalebar with the pixel to micron ratio specified with the image.
show_statistics	plot	bool	True	True, False	Add circular statistics to plot title.
show_polarity_angles	plot	bool	True	True, False	Indicates whether to additionally add the polarity angles to the polarity plots.
show_graphics_axis	plot	bool	False	True, False	Additionally shows the axes of the image.
length_scalebar_microns	plot	float	10	0 - inf	Length of the scalebar in microns.
outline_width	plot	integer	2	0 - inf	Outline width of a cell.
graphics_output_format	plot	string	“png”, “pdf”	“png”, “pdf” , “svg”	The output format of the plot figures. Several can be specified. Default is png and pdf.
dpi	plot	integer	300	50 - 1200	Resolution of the plots. Specifies the dots per inch.
graphics_width	plot	integer	5	1 - 15	The width of the output plot figures in inches.
graphics_height	plot	integer	5	1 - 15	The width of the output plot figures in inches.
fontsize_text_annotations	plot	integer	6	1 - inf	Fontsize of the text annotations.
font_color	plot	string	“w”	matplotlib colors	Color of the text annotations.
marker_size	plot	integer	2	1 - inf	Size of the markers in the plot.
alpha	plot	float	0.5	0 - 1	Transparency of the overlay masks in the plot.
alpha_cell_outline	plot	float	0.5	0 - 1	Transparency of the cell outline in the plot.

Key file

Often, analysts are challenged not only with the problem of actually performing the analysis, but also with the problem of how and where to store the data. Iterative acquisition of images as well as various experimental settings sometimes require complex folder structures and naming schema to organize data. Frequently, researchers face the problem of data being distributed over several physical devices, leaving them with the problem of how to execute a certain tool on a dedicated subset of images. Not often a lot of time is necessary to spend before the analysis is performed. Moreover, performing analysis steps on several experimental conditions often requires repeating the whole pipeline several times to get the desired output. To tackle this problem, polarityjam offers the execution option run_key that accepts a .csv file describing the storage structures and conditions. To still be able to migrate the data without altering the csv, paths are relative to a given root folder (e.g. inputpath).

The structure of the csv is given as follows:

folder_name	short_name
set_1	cond_1
set_2	cond_2

Folder structure will also be created in the provided output path. Specify a short_name different to the folder_name to rename each folder. (e.g. folder set_1 will be named cond_1 in the output path)

To better understand the concept, in the following you see a tree structure of the input and output folders visualized:

input
├── set_1
│   ├── myfile1.tif
│   └── myfile2.tif
└── set_2
    └── myfile3.tif

The corresponding output folder structure would be:

output
├── cond_1
│   ├── myfile1.csv
│   ├── myfile2.csv
│   └── merged_table_cond_1.csv
├── cond_2
│   ├── myfile3.csv
│   └── merged_table_cond_2.csv
├── key_file.csv
├── run_20220610_13-10-10.log
├── run_20220610_13-10-10_param.yml
└── summary_table.csv

Warning

Using OS specific paths in the key-file.csv might hurt reproducibility! (e.g. windows paths are different than unix paths!)

Web app

The R-shiny web app further analyses the results of the feature extraction process in the browser. There are several statistics available whose parameters can be adapted/adjusted during runtime to immediately observe the change in the corresponding visualization. Thus, exploring the data and revealing interesting patterns is heavily facilitated. To get to know more about the statics jump to circular statistics and continue reading or visit the method section.

Testing

We use a testing framework to make sure outcomes are as expected. To run the software with our example data provided in the package use the following command:

polarityjam_test

This will not keep the output on the disk. To look at the output of the tests specify a target folder:

polarityjam_test --target-folder=/tmp/mytarget

We tested our software on:

macOS 12.7.4 (21H1123), Kernel Version: Darwin 21.6.0 ubuntu 22.04.4 LTS, Kernel Version: 6.5.0-1018-azure Windows 10.0.20348 Build 2402 (without plot tests)