DECAL: Differential Expression analysis of Clonal Alterations Local effects based on Negative Binomial distribution

This function performs the clonal alterations differential expression analysis pairs of clonal sub-populations and perturbed genes.

decal(
  perturbations,
  count,
  clone,
  theta = NULL,
  theta_sample = 2000,
  min_mu = 0.05,
  min_n = 3,
  min_x = 1,
  gene_col = "gene",
  clone_col = "clone",
  p_method = "BH"
)

Arguments

perturbations	table with clone and gene perturbations pairs to model differential expression effect.
count	UMI count matrix with cells as columns and genes (or features) as rows.
clone	list of cells per clone.
theta	gene (or features) dispersion
theta_sample	number of genes sampled to preliminary `theta` estimation.
min_mu	minimal overall average expression (`mu`) required.
min_n	minimal number of perturbed cells (`n1`) required.
min_x	minimal average expression of perturbed (`x1`) and non-perturbed cells (`x0`) required.
gene_col	gene index column name in `perturbations`
clone_col	clone index column name in `perturbations`
p_method	p-value adjustment for multiple comparisons. See `\link[stats]{p.adjust}`.

Value

it extends perturbations table adding the following columns:

n0 and n1: number of non-perturbed and perturbed cells
x0 and x1: number of non-perturbed and perturbed cells average count
mu: overall average expression
theta: negative binomial dispersion parameter
xb: perturbed cells' estimated average count
z: perturbed cells' standardize z-score effect
lfc: perturbed cells' log2 fold-change effect
pvalue
p_adjusted

Details

Given a table of clone and gene pairs, a UMI count matrix, and list of cells per clone, this function models gene expression (Y) with a negative binomial (a.k.a. Gamma-Poisson) distribution for each perturbation pair as a function of X (clone indicator variable) offset by the cell total count (D) as described by the model:

$$Y \sim NB(xb, theta)$$ $$log(xb) = \beta_0 + \beta_x * X + log(D)$$ $$theta \sim \mu$$

The gene dispersion parameter (theta) is estimated and regularized in two steps as developed by Hafemeister & Satija (2019). First, for a subset of genes it fits a Poisson regression offseted by log(D) and estimate a crude theta using a maximum likelihood estimator with the observed counts and regression results. Next, it regularize and expands theta estimates with a kernel smoothing function as a function of average count (mu).