This function performs the clonal alterations differential expression analysis pairs of clonal sub-populations and perturbed genes.

decal(
  perturbations,
  count,
  clone,
  theta = NULL,
  theta_sample = 2000,
  min_mu = 0.05,
  min_n = 3,
  min_x = 1,
  gene_col = "gene",
  clone_col = "clone",
  p_method = "BH"
)

Arguments

perturbations

table with clone and gene perturbations pairs to model differential expression effect.

count

UMI count matrix with cells as columns and genes (or features) as rows.

clone

list of cells per clone.

theta

gene (or features) dispersion

theta_sample

number of genes sampled to preliminary theta estimation.

min_mu

minimal overall average expression (mu) required.

min_n

minimal number of perturbed cells (n1) required.

min_x

minimal average expression of perturbed (x1) and non-perturbed cells (x0) required.

gene_col

gene index column name in perturbations

clone_col

clone index column name in perturbations

p_method

p-value adjustment for multiple comparisons. See \link[stats]{p.adjust}.

Value

it extends perturbations table adding the following columns:

  • n0 and n1: number of non-perturbed and perturbed cells

  • x0 and x1: number of non-perturbed and perturbed cells average count

  • mu: overall average expression

  • theta: negative binomial dispersion parameter

  • xb: perturbed cells' estimated average count

  • z: perturbed cells' standardize z-score effect

  • lfc: perturbed cells' log2 fold-change effect

  • pvalue

  • p_adjusted

Details

Given a table of clone and gene pairs, a UMI count matrix, and list of cells per clone, this function models gene expression (Y) with a negative binomial (a.k.a. Gamma-Poisson) distribution for each perturbation pair as a function of X (clone indicator variable) offset by the cell total count (D) as described by the model:

$$Y \sim NB(xb, theta)$$ $$log(xb) = \beta_0 + \beta_x * X + log(D)$$ $$theta \sim \mu$$

The gene dispersion parameter (theta) is estimated and regularized in two steps as developed by Hafemeister & Satija (2019). First, for a subset of genes it fits a Poisson regression offseted by log(D) and estimate a crude theta using a maximum likelihood estimator with the observed counts and regression results. Next, it regularize and expands theta estimates with a kernel smoothing function as a function of average count (mu).