Score a dimension reduction by correlating normalised centroid distances

Compute a score based on correlation between batch-wise and dataset-wise normalised centroid distances between cell types. A PCA is computed for each batch and is compared to the reduction output by an integration method.

Usage

ScoreScGraph(
  object,
  cell.var,
  batch.var = NULL,
  reduction = Reductions(object),
  ndims.use = NULL,
  ndims.batch = 10L,
  nfeatures = 1000L,
  approx = FALSE,
  trim = 0.2,
  min_cells_type = 20L,
  min_cells_batch = 20L,
  cor_method = c("weighted_pearson", "pearson", "spearman"),
  assay = NULL,
  verbose = TRUE
)

AddScoreScGraph(
  object,
  integration,
  cell.var,
  batch.var = NULL,
  reduction,
  ndims.use = NULL,
  ndims.batch = 10L,
  nfeatures = 1000L,
  approx = FALSE,
  trim = 0.2,
  min_cells_type = 20L,
  min_cells_batch = 20L,
  cor_method = c("weighted_pearson", "pearson", "spearman"),
  assay = NULL,
  verbose = TRUE
)

Arguments

object: A Seurat object
cell.var: The name(s) of the column(s) with cell type label variable (must be in the object metadata). Multiple column names are accepted
batch.var: The name of the batch variable (must be in the object metadata). Can be omitted if the Seurat object contains multiple layers
reduction: The name of the reduction(s) to score. Multiple names are allowed
ndims.use: Number of dimensions from reduction to compute centroids with. If the lengths of reduction and ndims.use do not match, ndims.use is recycled or trimmed to have the same length as reduction.
ndims.batch: Number of principal components to compute for batch-wise PCAs (and to compute centroids with)
nfeatures: Number of features to scale before computing batch-wise PCAs
approx: whether to use truncated singular value decomposition to approximate batch-wise PCAs.
trim: the fraction (0 to 0.5) of observations to be trimmed from each end of PCs or reductions when computing centroids.
min_cells_type: Threshold below which cell types containing too few cells will be excluded
min_cells_batch: Threshold below which batches containing too few cells will be excluded
cor_method: method to compute correlation. One of "weighted_pearson", "pearson" and "spearman".
assay: the name of the assay to use
verbose: whether to print progress messages
integration: name of the integration(s) to score. Should match the length of reduction argument.

Value

ScoreScGraph: a named list, with on element per reduction. Each element corresponds to a named vector of raw scores (names are cell type variables)

AddScoreScGraph: the updated Seurat object with the scGraph raw score(s) set for the integrations.

Details

For each batch, a PCA is computed using batch-specific variable features. Then, a centroid (trimmed mean) is computed for each cell type and each PC. Cell-type to cell-type centroid distances are then computed for each PC. For each batch, distances are then normalised (divided by the largest distance) in a cell-type-wise manner and averaged across batches to obtain a single mean distance value between cell types.

The procedure is then partially repeated once (computation of centroids and distances and normalisation of such distances) on the (un)integrated dimensional reduction of full dataset.

Finally, the correlations between the first and the second distance values are computed for each cell-type. The score is obtained by averaging the correlation values. It is bounded by -1 (worse) and 1 (best).

Note

ScaleScores() scales the scores with $$\displaystyle Score_{scaled} = \frac{Score_{raw} + 1}{2}$$.

This score is based on a preprint (see References)

References

Wang H., Leskovec J., Regev A. Metric Mirages in Cell Embeddings. bioRxiv 2024.04.02.587824 [Preprint] (2024). DOI