Score a corrected or uncorrected PCA to estimate the contribution of S and G2M scores to variance

Linearly regresses S and G2M scores to predict principal components. The resulting R2 are then weighted by each dimension's contribution to variance. Cell cycles scores and PCAs are computed for each batch independently.

Usage

ScoreRegressPC.CellCycle(
  object,
  batch.var = NULL,
  what = NULL,
  dims.use = NULL,
  npcs = 50L,
  s.var = "S.Score",
  g2m.var = "G2M.Score",
  compute.cc = TRUE,
  s.features = NULL,
  g2m.features = NULL,
  assay = NULL,
  weight.by = c("var", "stdev"),
  adj.r2 = FALSE,
  approx = FALSE
)

AddScoreRegressPC.CellCycle(
  object,
  integration,
  batch.var = NULL,
  what = NULL,
  dims.use = NULL,
  npcs = 50L,
  s.var = "S.Score",
  g2m.var = "G2M.Score",
  compute.cc = TRUE,
  s.features = NULL,
  g2m.features = NULL,
  assay = NULL,
  weight.by = c("var", "stdev"),
  adj.r2 = FALSE,
  approx = FALSE
)

Arguments

object: A Seurat object
batch.var: The name of the batch variable (must be in the object metadata)
what: the slot from Seurat to score. Can be a layer or a reduction.
dims.use: The dimensions from what to consider. All dimensions are used by default
npcs: Total Number of PCs to compute and store (50 by default)
s.var: The name of the S phase score variable (must be in the object metadata)
g2m.var: The name of the G2M phase score variable (must be in the object metadata)
compute.cc: whether to (re-)compute the cell cycle scores. Should be TRUE (default), unless you have run CellCycleScoringPerBatch beforehand because cell cycles scores are expected to be computed per batch
s.features: A vector of features associated with S phase
g2m.features: A vector of features associated with G2M phase
assay: assay to use. Passed to Seurat to automatically construct the batch.var when not provided. Useless otherwise
weight.by: one of 'var' (default) or 'stdev' (standing for variance and standard deviation respectively). Use the variance or the standard deviation explained by the principal components to weight the each PC's score.
adj.r2: Whether to use the adjusted R2 instead of the raw R2
approx: Use truncated singular value decomposition to approximate PCA
integration: name of the integration to score

Value

ScoreRegressPC.CellCycle: A 2-columns data frame with the batch variable in the first one and the corresponding score in the second one. It has as many rows as batches.

AddScoreRegressPC.CellCycle: the updated Seurat object with the cell cycle conservation score set for the integration.

Details

The linear regression is $$PC_i = S_{score} + G2M_{score}$$

The score is computed as follow : $$\sum_{i=1}^{p} \left ( R^2_i * V_i \right )$$

For a PCA with p dimensions, $PC_i$ is the principal component i, $R^2_i$ is the R squared coefficient of the linear regression for the dimension i. $V_i$ is the proportion of variance explained by the $PC_i$.

Note

This score is an adaptation of the principal component regression (PCR) score from Luecken M.D. et al., 2022.

References

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI

Examples

if (FALSE) { # \dontrun{
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)

score.cc.r2 <- ScoreRegressPC.CellCycle(obj, "Method", "pca", dim.use = 1:30)
score.cc.adj.r2 <- ScoreRegressPC.CellCycle(obj, "Method", "pca", dim.use = 1:30, adj.r2 = TRUE)

score.cc.r2        # ~ 0.0249
score.cc.adj.r2    # ~ 0.0249
} # }