Skip to contents

Compute scores based on the average silhouette width (ASW) metric.

ScoreASW: First, cell-to-cell distances are computed on the provided embedding or layer matrix. Then, the silhouette score is calculated to estimate the quality of the partition according to the variable with cell type labels. Hence, this score measures to what extent cells with identical label cluster together.

ScoreASWBatch: Similar, but with the batch variable. This score provides an estimation of batch mixing.

Usage

ScoreASW(
  object,
  cell.var,
  what,
  assay = NULL,
  metric = c("euclidean", "cosine", "angular", "manhattan", "hamming"),
  dist.package = c("distances", "Rfast", "parallelDist", "stats"),
  verbose = TRUE,
  ...
)

AddScoreASW(
  object,
  integration,
  cell.var,
  what,
  assay = NULL,
  metric = c("euclidean", "cosine", "angular", "manhattan", "hamming"),
  dist.package = c("distances", "Rfast", "parallelDist", "stats"),
  verbose = TRUE,
  ...
)

ScoreASWBatch(
  object,
  batch.var = NULL,
  cell.var = NULL,
  what,
  per.cell.var = TRUE,
  assay = NULL,
  metric = c("euclidean", "cosine", "angular", "manhattan", "hamming"),
  dist.package = c("distances", "Rfast", "parallelDist", "stats"),
  verbose = TRUE,
  ...
)

AddScoreASWBatch(
  object,
  integration,
  batch.var = NULL,
  cell.var = NULL,
  what,
  per.cell.var = TRUE,
  assay = NULL,
  metric = c("euclidean", "cosine", "angular", "manhattan", "hamming"),
  dist.package = c("distances", "Rfast", "parallelDist", "stats"),
  verbose = TRUE,
  ...
)

Arguments

object

A Seurat object

cell.var

The name of the column with cell type label variable (must be in the object metadata). Ignored by ScoreASWBatch when per.cell.var = FALSE

what

the name of the dimension reduction or layer to score. Must be in the Seurat object or obtainable after a JoinLayers() call.

assay

name of the assay to use. The output of DefaultAssay() is used by default

metric

name of the distance metric to use. One of 'euclidean', 'cosine', 'angular', 'manhattan', 'hamming'. See Note for details.

dist.package

name of the package to compute distances with. One of 'distances', 'Rfast' ,'parallelDist', 'stats'. The latter is always available, the others must be installed beforehand. They are ordered from fastest to slowest. When the requested package is not installed, the fastest amongst the available ones is picked. See Note for details.

verbose

Print messages. Set to FALSE to disable

...

additional parameters to pass to the distance computation functions

integration

name of the integration to score

batch.var

The name of the column with batch variable. (must be in the object metadata). Required by ScoreASWBatch.

per.cell.var

whether to compute silhouette coefficients with the batch variable for each cell-type separately (default behaviour). Setting to FALSE causes the silhouette coefficients to be computed on the whole data directly.

Value

ScoreASW and ScoreASWBatch: a single float between 0 and 1, corresponding to the scaled average silhouette score.

AddScoreASW and AddScoreASWBatch: the updated Seurat object with the ASW score(s) set for the integration.

Details

ScoreASW: Given a matrix (reduction dimension or layer), the cell-to-cell distance matrix \(D\) is computed. Then, the silhouette width \(s(i)\) is calculated for each cell \(i\) with a label \(c \in L\) (\(L\) is the set of possible cell type labels). Then, the mean of all \(s(i)\) is computed (i.e. the ASW) and scaled between 0 and 1: $$\displaystyle ASW = \frac{1}{\left| L \right|} \times \sum_{i}{s(i)} \\[10pt] score = \frac{ASW + 1}{2}$$

ScoreAWSBatch: The default parameters (per.cell.var = TRUE) correspond to the original score from Luecken M.D. et al., 2022. It is computed as follow: for each cell type label \(c\) among the set of all labels \(L\), the cell-to-cell matrix distance \(D_c\) is computed for cells \(i\) with label \(c\). Each cell's silhouette width \(s(i)\) is calculated according to the batch variable and transformed to be as close to 1 as its absolute value is close to 0. An average silhouette width \(ASW_c\) is then computed per cell type label \(c \in L\) and the mean of those correspond to the final score: $$\displaystyle ASW_c = \frac{1}{\left| c \right|} \times \sum_{i \in c}{1 - \left| s(i) \right|} \\[10pt] score = \frac{1}{\left| L \right|} \times \sum_{c \in L}{ASW_c}$$

When per.cell.var = FALSE, \(ASW\) is computed for all cells at once (just like for ScoreASW but on the batch variable), then scaled and averaged similarly: $$\displaystyle score = ASW = \frac{1}{N} \times \sum_{i=1}^{N}{1 - \left| s(i) \right|}$$ with \(N\) being the total number of cells

Note

Those scores are an adaptation of the (cell-type) ASW and the batch ASW as described in Luecken M.D. et al., 2022.

Hamming distance is only supported by the parallelDist package, while distances can only compute euclidean and related distances (cosine and angular). Angular distances are actually refereed to as 'cosine' in FindNeighbors() (annoy.metric), hence called 'angular' here. Actual cosine dissimilarity-derived distances are returned when metric = 'cosine'. Internally, angular distances are computed with euclidean distances on \(L^2\) norm. cosine distances are further transformed with : $$\displaystyle D_{cosine} = \frac{D_{angular}^2}{2}$$

References

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI

See also

FindNeighbors, silhouette to know more about the silhouette metric, GetNeighborsPerBatch and GetPropInterBatch