Skip to contents

A batch-wise kernel density estimate is computed for each dimension of the PCA. Then, the global area shared by all kernels is calculated and divided by a reference area (see Arguments and Details) as a proxy to batch mixing. Finally, the aforementioned proportions are weighted by each dimension's contribution to variance.

Usage

ScoreDensityPC(
  object,
  batch.var = NULL,
  reduction = "pca",
  dims = NULL,
  use.union = TRUE,
  bw.join = mean,
  bw.kernel = c("nrd0", "nrd", "ucv", "bcv", "sj"),
  weight.by = c("var", "stdev"),
  assay = NULL,
  layer = NULL
)

AddScoreDensityPC(
  object,
  integration,
  batch.var = NULL,
  reduction = "pca",
  dims = NULL,
  use.union = TRUE,
  bw.join = mean,
  bw.kernel = c("nrd0", "nrd", "ucv", "bcv", "sj"),
  weight.by = c("var", "stdev"),
  assay = NULL,
  layer = NULL
)

Arguments

object

A Seurat object

batch.var

The name of the batch variable (must be in the object metadata)

reduction

The name of the reduction to score

dims

The dimensions to consider. All dimensions are used by default

use.union

Whether to use a union of densities (constructs a fake density with pmax). The alternative is to use all density kernels and sum them

bw.join

A function to pick a joint bandwidth. The default is to use the mean(). You can also chose median for instance. If you want to use a distinct bandwidth for each batch, just set bw.join = c (not recommended)

bw.kernel

Bandwidth selector to use for Gaussian kernel density estimation. One of 'nrd0', 'nrd', 'ucv','bcv' or 'sj' (see bandwidths). nrd0 is the default

weight.by

one of 'var' (default) or 'stdev' (standing for variance and standard deviation respectively). Use the variance or the standard deviation explained by the principal components to weight the each PC's score.

assay

assay to use. Passed to Seurat to automatically construct the batch.var when not provided. Useless otherwise

layer

layer to use. Passed to Seurat to automatically construct the batch.var when not provided. Useless otherwise

integration

name of the integration to score

Value

ScoreDensityPC: A single float corresponding to the score of the given reduction

AddScoreDensityPC: the updated Seurat object with the density PCA score set for the integration.

Details

The score is computed as follow : $$\sum_{i=1}^{p} \left ( \cfrac{ \bigcup_{b=1}^n D_{bi} }{ Ref_i } * V_i \right )$$

For a PCA with p dimensions and n batchs. \(D_{bi}\) is the vector of density of batch b on the dimension i. \(V_i\) is the proportion of variance explained by the \(PC_i\).

with \(Ref = \bigcap_{i=1}^n D_{bi}\) or \(Ref = \sum_{i=1}^n D_{bi}\) when use.union is TRUE and FALSE respectively, for a given dimension \(i\).

use.union = FALSE tends to result in lower score since the reference is the sum of kernel densities instead of maximum

Note

This score is an adaptation of the principal component regression (PCR) score from Luecken M.D. et al., 2022.

References

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI

See also

ScoreRegressPC to compute the PCR score

Examples

if (FALSE) { # \dontrun{
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)

score.union <- ScoreDensityPC(obj, "Method", "pca", dim = 1:30)
score.sum <- ScoreDensityPC(obj, "Method", "pca", dim = 1:30, use.union = FALSE)

score.union   # ~ 0.1511
score.sum     # ~ 0.0319
} # }