Skip to contents

Linearly regresses S and G2M scores to predict principal components. The resulting R2 are then weighted by each dimension's contribution to variance. Cell cycles scores and PCAs are computed for each batch independently.

Usage

ScoreRegressPC.CellCycle(
  object,
  batch.var = NULL,
  what = NULL,
  dims.use = NULL,
  npcs = 50L,
  s.var = "S.Score",
  g2m.var = "G2M.Score",
  compute.cc = TRUE,
  s.features = NULL,
  g2m.features = NULL,
  assay = NULL,
  weight.by = c("var", "stdev"),
  adj.r2 = FALSE,
  approx = FALSE
)

AddScoreRegressPC.CellCycle(
  object,
  integration,
  batch.var = NULL,
  what = NULL,
  dims.use = NULL,
  npcs = 50L,
  s.var = "S.Score",
  g2m.var = "G2M.Score",
  compute.cc = TRUE,
  s.features = NULL,
  g2m.features = NULL,
  assay = NULL,
  weight.by = c("var", "stdev"),
  adj.r2 = FALSE,
  approx = FALSE
)

Arguments

object

A Seurat object

batch.var

The name of the batch variable (must be in the object metadata)

what

the slot from Seurat to score. Can be a layer or a reduction.

dims.use

The dimensions from what to consider. All dimensions are used by default

npcs

Total Number of PCs to compute and store (50 by default)

s.var

The name of the S phase score variable (must be in the object metadata)

g2m.var

The name of the G2M phase score variable (must be in the object metadata)

compute.cc

whether to (re-)compute the cell cycle scores. Should be TRUE (default), unless you have run CellCycleScoringPerBatch beforehand because cell cycles scores are expected to be computed per batch

s.features

A vector of features associated with S phase

g2m.features

A vector of features associated with G2M phase

assay

assay to use. Passed to Seurat to automatically construct the batch.var when not provided. Useless otherwise

weight.by

one of 'var' (default) or 'stdev' (standing for variance and standard deviation respectively). Use the variance or the standard deviation explained by the principal components to weight the each PC's score.

adj.r2

Whether to use the adjusted R2 instead of the raw R2

approx

Use truncated singular value decomposition to approximate PCA

integration

name of the integration to score

Value

ScoreRegressPC.CellCycle: A 2-columns data frame with the batch variable in the first one and the corresponding score in the second one. It has as many rows as batches.

AddScoreRegressPC.CellCycle: the updated Seurat object with the cell cycle conservation score set for the integration.

Details

The linear regression is $$PC_i = S_{score} + G2M_{score}$$

The score is computed as follow : $$\sum_{i=1}^{p} \left ( R^2_i * V_i \right )$$

For a PCA with p dimensions, \(PC_i\) is the principal component i, \(R^2_i\) is the R squared coefficient of the linear regression for the dimension i. \(V_i\) is the proportion of variance explained by the \(PC_i\).

Note

This score is an adaptation of the principal component regression (PCR) score from Luecken M.D. et al., 2022.

References

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI

See also

CellCycleScoringPerBatch to compute cc scores per batch. ScoreRegressPC to regresses PCs by batch.

Examples

if (FALSE) { # \dontrun{
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)

score.cc.r2 <- ScoreRegressPC.CellCycle(obj, "Method", "pca", dim.use = 1:30)
score.cc.adj.r2 <- ScoreRegressPC.CellCycle(obj, "Method", "pca", dim.use = 1:30, adj.r2 = TRUE)

score.cc.r2        # ~ 0.0249
score.cc.adj.r2    # ~ 0.0249
} # }