Score a corrected or uncorrected PCA to estimate batch mixing
Source:R/metrics_pca.R
score-densityPC.Rd
A batch-wise kernel density estimate is computed for each dimension of the PCA. Then, the global area shared by all kernels is calculated and divided by a reference area (see Arguments and Details) as a proxy to batch mixing. Finally, the aforementioned proportions are weighted by each dimension's contribution to variance.
Usage
ScoreDensityPC(
object,
batch.var = NULL,
reduction = "pca",
dims = NULL,
use.union = TRUE,
bw.join = mean,
bw.kernel = c("nrd0", "nrd", "ucv", "bcv", "sj"),
weight.by = c("var", "stdev"),
assay = NULL,
layer = NULL
)
AddScoreDensityPC(
object,
integration,
batch.var = NULL,
reduction = "pca",
dims = NULL,
use.union = TRUE,
bw.join = mean,
bw.kernel = c("nrd0", "nrd", "ucv", "bcv", "sj"),
weight.by = c("var", "stdev"),
assay = NULL,
layer = NULL
)
Arguments
- object
A Seurat object
- batch.var
The name of the batch variable (must be in the object metadata)
- reduction
The name of the reduction to score
- dims
The dimensions to consider. All dimensions are used by default
- use.union
Whether to use a union of densities (constructs a fake density with pmax). The alternative is to use all density kernels and sum them
- bw.join
A function to pick a joint bandwidth. The default is to use the
mean()
. You can also chosemedian
for instance. If you want to use a distinct bandwidth for each batch, just setbw.join = c
(not recommended)- bw.kernel
Bandwidth selector to use for Gaussian kernel density estimation. One of 'nrd0', 'nrd', 'ucv','bcv' or 'sj' (see bandwidths). nrd0 is the default
- weight.by
one of 'var' (default) or 'stdev' (standing for variance and standard deviation respectively). Use the variance or the standard deviation explained by the principal components to weight the each PC's score.
- assay
assay to use. Passed to Seurat to automatically construct the
batch.var
when not provided. Useless otherwise- layer
layer to use. Passed to Seurat to automatically construct the
batch.var
when not provided. Useless otherwise- integration
name of the integration to score
Value
ScoreDensityPC
: A single float corresponding to the score of
the given reduction
AddScoreDensityPC
: the updated Seurat object
with the density
PCA score set for the integration.
Details
The score is computed as follow : $$\sum_{i=1}^{p} \left ( \cfrac{ \bigcup_{b=1}^n D_{bi} }{ Ref_i } * V_i \right )$$
For a PCA with p dimensions and n batchs. \(D_{bi}\) is the vector of density of batch b on the dimension i. \(V_i\) is the proportion of variance explained by the \(PC_i\).
with \(Ref = \bigcap_{i=1}^n D_{bi}\) or \(Ref = \sum_{i=1}^n D_{bi}\)
when use.union
is TRUE
and FALSE
respectively, for a
given dimension \(i\).
use.union = FALSE
tends to result in lower score since the
reference is the sum of kernel densities instead of maximum
Note
This score is an adaptation of the principal component regression (PCR) score from Luecken M.D. et al., 2022.
References
Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI
See also
ScoreRegressPC
to compute the PCR score
Examples
if (FALSE) { # \dontrun{
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
score.union <- ScoreDensityPC(obj, "Method", "pca", dim = 1:30)
score.sum <- ScoreDensityPC(obj, "Method", "pca", dim = 1:30, use.union = FALSE)
score.union # ~ 0.1511
score.sum # ~ 0.0319
} # }