
Score a corrected or uncorrected PCA to estimate the contribution of S and G2M scores to variance
Source:R/metrics_pca.R
      score-cc.RdLinearly regresses S and G2M scores to predict principal components. The resulting R2 are then weighted by each dimension's contribution to variance. Cell cycles scores and PCAs are computed for each batch independently.
Usage
ScoreRegressPC.CellCycle(
  object,
  batch.var = NULL,
  what = NULL,
  dims.use = NULL,
  npcs = 50L,
  s.var = "S.Score",
  g2m.var = "G2M.Score",
  compute.cc = TRUE,
  s.features = NULL,
  g2m.features = NULL,
  assay = NULL,
  weight.by = c("var", "stdev"),
  adj.r2 = FALSE,
  approx = FALSE
)
AddScoreRegressPC.CellCycle(
  object,
  integration,
  batch.var = NULL,
  what = NULL,
  dims.use = NULL,
  npcs = 50L,
  s.var = "S.Score",
  g2m.var = "G2M.Score",
  compute.cc = TRUE,
  s.features = NULL,
  g2m.features = NULL,
  assay = NULL,
  weight.by = c("var", "stdev"),
  adj.r2 = FALSE,
  approx = FALSE
)Arguments
- object
- A Seurat object 
- batch.var
- The name of the batch variable (must be in the object metadata) 
- what
- the slot from Seurat to score. Can be a layer or a reduction. 
- dims.use
- The dimensions from - whatto consider. All dimensions are used by default
- npcs
- Total Number of PCs to compute and store (50 by default) 
- s.var
- The name of the S phase score variable (must be in the object metadata) 
- g2m.var
- The name of the G2M phase score variable (must be in the object metadata) 
- compute.cc
- whether to (re-)compute the cell cycle scores. Should be - TRUE(default), unless you have run- CellCycleScoringPerBatchbeforehand because cell cycles scores are expected to be computed per batch
- s.features
- A vector of features associated with S phase 
- g2m.features
- A vector of features associated with G2M phase 
- assay
- assay to use. Passed to Seurat to automatically construct the - batch.varwhen not provided. Useless otherwise
- weight.by
- one of 'var' (default) or 'stdev' (standing for variance and standard deviation respectively). Use the variance or the standard deviation explained by the principal components to weight the each PC's score. 
- adj.r2
- Whether to use the adjusted R2 instead of the raw R2 
- approx
- Use truncated singular value decomposition to approximate PCA 
- integration
- name of the integration to score 
Value
ScoreRegressPC.CellCycle: A 2-columns data frame with the
batch variable in the first one and the corresponding score in the second
one. It has as many rows as batches.
AddScoreRegressPC.CellCycle: the updated Seurat object with the
cell cycle conservation score set for the integration.
Details
The linear regression is $$PC_i = S_{score} + G2M_{score}$$
The score is computed as follow : $$\sum_{i=1}^{p} \left ( R^2_i * V_i \right )$$
For a PCA with p dimensions, \(PC_i\) is the principal component i, \(R^2_i\) is the R squared coefficient of the linear regression for the dimension i. \(V_i\) is the proportion of variance explained by the \(PC_i\).
Note
This score is an adaptation of the principal component regression (PCR) score from Luecken M.D. et al., 2022.
References
Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI
See also
CellCycleScoringPerBatch to compute cc scores per
batch. ScoreRegressPC to regresses PCs by batch.
Examples
if (FALSE) { # \dontrun{
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
score.cc.r2 <- ScoreRegressPC.CellCycle(obj, "Method", "pca", dim.use = 1:30)
score.cc.adj.r2 <- ScoreRegressPC.CellCycle(obj, "Method", "pca", dim.use = 1:30, adj.r2 = TRUE)
score.cc.r2        # ~ 0.0249
score.cc.adj.r2    # ~ 0.0249
} # }