Skip to contents

Compute a score based on correlation between batch-wise and dataset-wise normalised centroid distances between cell types. A PCA is computed for each batch and is compared to the reduction output by an integration method.

Usage

ScoreScGraph(
  object,
  cell.var,
  batch.var = NULL,
  reduction = Reductions(object),
  ndims.use = NULL,
  ndims.batch = 10L,
  nfeatures = 1000L,
  approx = FALSE,
  trim = 0.2,
  min_cells_type = 20L,
  min_cells_batch = 20L,
  cor_method = c("weighted_pearson", "pearson", "spearman"),
  assay = NULL,
  verbose = TRUE
)

AddScoreScGraph(
  object,
  integration,
  cell.var,
  batch.var = NULL,
  reduction,
  ndims.use = NULL,
  ndims.batch = 10L,
  nfeatures = 1000L,
  approx = FALSE,
  trim = 0.2,
  min_cells_type = 20L,
  min_cells_batch = 20L,
  cor_method = c("weighted_pearson", "pearson", "spearman"),
  assay = NULL,
  verbose = TRUE
)

Arguments

object

A Seurat object

cell.var

The name(s) of the column(s) with cell type label variable (must be in the object metadata). Multiple column names are accepted

batch.var

The name of the batch variable (must be in the object metadata). Can be omitted if the Seurat object contains multiple layers

reduction

The name of the reduction(s) to score. Multiple names are allowed

ndims.use

Number of dimensions from reduction to compute centroids with. If the lengths of reduction and ndims.use do not match, ndims.use is recycled or trimmed to have the same length as reduction.

ndims.batch

Number of principal components to compute for batch-wise PCAs (and to compute centroids with)

nfeatures

Number of features to scale before computing batch-wise PCAs

approx

whether to use truncated singular value decomposition to approximate batch-wise PCAs.

trim

the fraction (0 to 0.5) of observations to be trimmed from each end of PCs or reductions when computing centroids.

min_cells_type

Threshold below which cell types containing too few cells will be excluded

min_cells_batch

Threshold below which batches containing too few cells will be excluded

cor_method

method to compute correlation. One of "weighted_pearson", "pearson" and "spearman".

assay

the name of the assay to use

verbose

whether to print progress messages

integration

name of the integration(s) to score. Should match the length of reduction argument.

Value

ScoreScGraph: a named list, with on element per reduction. Each element corresponds to a named vector of raw scores (names are cell type variables)

AddScoreScGraph: the updated Seurat object with the scGraph raw score(s) set for the integrations.

Details

For each batch, a PCA is computed using batch-specific variable features. Then, a centroid (trimmed mean) is computed for each cell type and each PC. Cell-type to cell-type centroid distances are then computed for each PC. For each batch, distances are then normalised (divided by the largest distance) in a cell-type-wise manner and averaged across batches to obtain a single mean distance value between cell types.

The procedure is then partially repeated once (computation of centroids and distances and normalisation of such distances) on the (un)integrated dimensional reduction of full dataset.

Finally, the correlations between the first and the second distance values are computed for each cell-type. The score is obtained by averaging the correlation values. It is bounded by -1 (worse) and 1 (best).

Note

ScaleScores() scales the scores with $$\displaystyle Score_{scaled} = \frac{Score_{raw} + 1}{2}$$.

This score is based on a preprint (see References)

References

Wang H., Leskovec J., Regev A. Metric Mirages in Cell Embeddings. bioRxiv 2024.04.02.587824 [Preprint] (2024). DOI