decal.Rd
This function performs the clonal alterations differential expression analysis pairs of clonal sub-populations and perturbed genes.
decal( perturbations, count, clone, theta = NULL, theta_sample = 2000, min_mu = 0.05, min_n = 3, min_x = 1, gene_col = "gene", clone_col = "clone", p_method = "BH" )
perturbations | table with clone and gene perturbations pairs to model differential expression effect. |
---|---|
count | UMI count matrix with cells as columns and genes (or features) as rows. |
clone | list of cells per clone. |
theta | gene (or features) dispersion |
theta_sample | number of genes sampled to preliminary |
min_mu | minimal overall average expression ( |
min_n | minimal number of perturbed cells ( |
min_x | minimal average expression of perturbed ( |
gene_col | gene index column name in |
clone_col | clone index column name in |
p_method | p-value adjustment for multiple comparisons.
See |
it extends perturbations
table adding the following columns:
n0
and n1
: number of non-perturbed and perturbed cells
x0
and x1
: number of non-perturbed and perturbed cells average count
mu
: overall average expression
theta
: negative binomial dispersion parameter
xb
: perturbed cells' estimated average count
z
: perturbed cells' standardize z-score effect
lfc
: perturbed cells' log2 fold-change effect
pvalue
p_adjusted
Given a table of clone and gene pairs, a UMI count matrix, and list of cells
per clone, this function models gene expression (Y
) with a negative
binomial (a.k.a. Gamma-Poisson) distribution for each perturbation pair as
a function of X
(clone indicator variable) offset by the cell total count
(D
) as described by the model:
$$Y \sim NB(xb, theta)$$ $$log(xb) = \beta_0 + \beta_x * X + log(D)$$ $$theta \sim \mu$$
The gene dispersion parameter (theta
) is estimated and regularized in two
steps as developed by Hafemeister & Satija (2019). First, for a subset of
genes it fits a Poisson regression offseted by log(D)
and estimate a
crude theta
using a maximum likelihood estimator with the observed counts
and regression results. Next, it regularize and expands theta
estimates
with a kernel smoothing function as a function of average count (mu
).