
Score a dimension reduction by correlating normalised centroid distances
Source:R/metrics_scGraph.R
score-scgraph.Rd
Compute a score based on correlation between batch-wise and dataset-wise normalised centroid distances between cell types. A PCA is computed for each batch and is compared to the reduction output by an integration method.
Usage
ScoreScGraph(
object,
cell.var,
batch.var = NULL,
reduction = Reductions(object),
ndims.use = NULL,
ndims.batch = 10L,
nfeatures = 1000L,
approx = FALSE,
trim = 0.2,
min_cells_type = 20L,
min_cells_batch = 20L,
cor_method = c("weighted_pearson", "pearson", "spearman"),
assay = NULL,
verbose = TRUE
)
AddScoreScGraph(
object,
integration,
cell.var,
batch.var = NULL,
reduction,
ndims.use = NULL,
ndims.batch = 10L,
nfeatures = 1000L,
approx = FALSE,
trim = 0.2,
min_cells_type = 20L,
min_cells_batch = 20L,
cor_method = c("weighted_pearson", "pearson", "spearman"),
assay = NULL,
verbose = TRUE
)
Arguments
- object
A Seurat object
- cell.var
The name(s) of the column(s) with cell type label variable (must be in the object metadata). Multiple column names are accepted
- batch.var
The name of the batch variable (must be in the object metadata). Can be omitted if the Seurat object contains multiple layers
- reduction
The name of the reduction(s) to score. Multiple names are allowed
- ndims.use
Number of dimensions from
reduction
to compute centroids with. If the lengths ofreduction
andndims.use
do not match,ndims.use
is recycled or trimmed to have the same length asreduction
.- ndims.batch
Number of principal components to compute for batch-wise PCAs (and to compute centroids with)
- nfeatures
Number of features to scale before computing batch-wise PCAs
- approx
whether to use truncated singular value decomposition to approximate batch-wise PCAs.
- trim
the fraction (0 to 0.5) of observations to be trimmed from each end of PCs or reductions when computing centroids.
- min_cells_type
Threshold below which cell types containing too few cells will be excluded
- min_cells_batch
Threshold below which batches containing too few cells will be excluded
- cor_method
method to compute correlation. One of "weighted_pearson", "pearson" and "spearman".
- assay
the name of the assay to use
- verbose
whether to print progress messages
- integration
name of the integration(s) to score. Should match the length of
reduction
argument.
Value
ScoreScGraph
: a named list, with on element per reduction.
Each element corresponds to a named vector of raw scores (names are cell type
variables)
AddScoreScGraph
: the updated Seurat object
with the scGraph
raw score(s) set for the integrations.
Details
For each batch, a PCA is computed using batch-specific variable features. Then, a centroid (trimmed mean) is computed for each cell type and each PC. Cell-type to cell-type centroid distances are then computed for each PC. For each batch, distances are then normalised (divided by the largest distance) in a cell-type-wise manner and averaged across batches to obtain a single mean distance value between cell types.
The procedure is then partially repeated once (computation of centroids and distances and normalisation of such distances) on the (un)integrated dimensional reduction of full dataset.
Finally, the correlations between the first and the second distance values are computed for each cell-type. The score is obtained by averaging the correlation values. It is bounded by -1 (worse) and 1 (best).
Note
ScaleScores()
scales the scores with
$$\displaystyle Score_{scaled} = \frac{Score_{raw} + 1}{2}$$.
This score is based on a preprint (see References)
References
Wang H., Leskovec J., Regev A. Metric Mirages in Cell Embeddings. bioRxiv 2024.04.02.587824 [Preprint] (2024). DOI