Skip to contents

Binary tree algorithm for mass cytometry data analysis.

Usage

CytofTree(
  M,
  minleaf = 1,
  t = 0.1,
  verbose = TRUE,
  force_first_markers = NULL,
  transformation = c("asinh", "biexp", "log10", "none"),
  num_col = 1:ncol(M)
)

Arguments

M

A matrix of size n x p containing mass cytometry measures of n cells on p markers.

minleaf

An integer indicating the minimum number of cells per population. Default is 1.

t

A real positive-or-null number used for comparison with the normalized AIC computed at each node of the tree. A higher value limits the height of the tree.

verbose

A logical controlling if a text progress bar is displayed during the execution of the algorithm. By default is TRUE.

force_first_markers

a vector of index to split the data on first. This argument is used in the semi-supervised setting, forcing the algorithm to consider those markers first, in the order they appear in this force_first_markers vector, and forcing the split at every node. Default is NULL, in which case the clustering algorithm is unsupervised.

transformation

A string indicating the transformation used among asinh biexp, log10 and none. Default is asinh transformation.

num_col

An integer vector of index indicating the columns to be transform. Default is 1:ncol(M) to transform all the data.

Value

An object of class 'cytomeTree' providing a partitioning of the set of n cells.

  • annotation A data.frame containing the annotation of each cell population underlying the tree pattern.

  • labels The partitioning of the set of n cells.

  • M The transformed matrix of mass cytometry.

  • mark_tree A two level list containing markers used for node splitting.

  • transformation Transformation used

  • num_col Indexes of columns transformed

Details

First of all, data can be transformed using different transformations. The algorithm is based on the construction of a binary tree, the nodes of which are subpopulations of cells. At each node, observed cells and markers are modeled by both a family of normal distributions and a family of bi-modal normal mixture distributions. Splitting is done according to a normalized difference of AIC between the two families.

Author

Anthony Devaux, Boris Hejblum

Examples

data(IMdata)

# dimension of data
dim(IMdata)
#> [1] 10000    39

# given the size of the dataset, the code below can take several minutes to run

if(interactive()){
# Don't transform Time et Cell_length column
num_col <- 3:ncol(IMdata)

# Build Cytoftree binary tree
tree <- CytofTree(M = IMdata, minleaf = 1, t = 0.1, transformation = "asinh", num_col = num_col)

# Annotation
annot <- Annotation(tree, plot = FALSE, K2markers = colnames(IMdata))

# Provide subpopulations
annot$combinations
}