Score a corrected or uncorrected PCA to estimate the contribution of S and G2M scores to variance
Source:R/metrics_pca.R
score-cc.Rd
Linearly regresses S and G2M scores to predict principal components. The resulting R2 are then weighted by each dimension's contribution to variance. Cell cycles scores and PCAs are computed for each batch independently.
Usage
ScoreRegressPC.CellCycle(
object,
batch.var = NULL,
what = NULL,
dims.use = NULL,
npcs = 50L,
s.var = "S.Score",
g2m.var = "G2M.Score",
compute.cc = TRUE,
s.features = NULL,
g2m.features = NULL,
assay = NULL,
weight.by = c("var", "stdev"),
adj.r2 = FALSE,
approx = FALSE
)
AddScoreRegressPC.CellCycle(
object,
integration,
batch.var = NULL,
what = NULL,
dims.use = NULL,
npcs = 50L,
s.var = "S.Score",
g2m.var = "G2M.Score",
compute.cc = TRUE,
s.features = NULL,
g2m.features = NULL,
assay = NULL,
weight.by = c("var", "stdev"),
adj.r2 = FALSE,
approx = FALSE
)
Arguments
- object
A Seurat object
- batch.var
The name of the batch variable (must be in the object metadata)
- what
the slot from Seurat to score. Can be a layer or a reduction.
- dims.use
The dimensions from
what
to consider. All dimensions are used by default- npcs
Total Number of PCs to compute and store (50 by default)
- s.var
The name of the S phase score variable (must be in the object metadata)
- g2m.var
The name of the G2M phase score variable (must be in the object metadata)
- compute.cc
whether to (re-)compute the cell cycle scores. Should be
TRUE
(default), unless you have runCellCycleScoringPerBatch
beforehand because cell cycles scores are expected to be computed per batch- s.features
A vector of features associated with S phase
- g2m.features
A vector of features associated with G2M phase
- assay
assay to use. Passed to Seurat to automatically construct the
batch.var
when not provided. Useless otherwise- weight.by
one of 'var' (default) or 'stdev' (standing for variance and standard deviation respectively). Use the variance or the standard deviation explained by the principal components to weight the each PC's score.
- adj.r2
Whether to use the adjusted R2 instead of the raw R2
- approx
Use truncated singular value decomposition to approximate PCA
- integration
name of the integration to score
Value
ScoreRegressPC.CellCycle
: A 2-columns data frame with the
batch variable in the first one and the corresponding score in the second
one. It has as many rows as batches.
AddScoreRegressPC.CellCycle
: the updated Seurat object
with the
cell cycle conservation score set for the integration.
Details
The linear regression is $$PC_i = S_{score} + G2M_{score}$$
The score is computed as follow : $$\sum_{i=1}^{p} \left ( R^2_i * V_i \right )$$
For a PCA with p dimensions, \(PC_i\) is the principal component i, \(R^2_i\) is the R squared coefficient of the linear regression for the dimension i. \(V_i\) is the proportion of variance explained by the \(PC_i\).
Note
This score is an adaptation of the principal component regression (PCR) score from Luecken M.D. et al., 2022.
References
Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI
See also
CellCycleScoringPerBatch
to compute cc scores per
batch. ScoreRegressPC
to regresses PCs by batch.
Examples
if (FALSE) { # \dontrun{
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
score.cc.r2 <- ScoreRegressPC.CellCycle(obj, "Method", "pca", dim.use = 1:30)
score.cc.adj.r2 <- ScoreRegressPC.CellCycle(obj, "Method", "pca", dim.use = 1:30, adj.r2 = TRUE)
score.cc.r2 # ~ 0.0249
score.cc.adj.r2 # ~ 0.0249
} # }