Score an embedding or a count matrix with the average silhouette width
Source:R/metrics_silhouette.R
score-asw.Rd
Compute scores based on the average silhouette width (ASW) metric.
ScoreASW
: First, cell-to-cell distances are computed on the provided
embedding or layer matrix. Then, the silhouette score is calculated to
estimate the quality of the partition according to the variable with cell
type labels. Hence, this score measures to what extent cells with identical
label cluster together.
ScoreASWBatch
: Similar, but with the batch variable. This score
provides an estimation of batch mixing.
Usage
ScoreASW(
object,
cell.var,
what,
assay = NULL,
metric = c("euclidean", "cosine", "angular", "manhattan", "hamming"),
dist.package = c("distances", "Rfast", "parallelDist", "stats"),
verbose = TRUE,
...
)
AddScoreASW(
object,
integration,
cell.var,
what,
assay = NULL,
metric = c("euclidean", "cosine", "angular", "manhattan", "hamming"),
dist.package = c("distances", "Rfast", "parallelDist", "stats"),
verbose = TRUE,
...
)
ScoreASWBatch(
object,
batch.var = NULL,
cell.var = NULL,
what,
per.cell.var = TRUE,
assay = NULL,
metric = c("euclidean", "cosine", "angular", "manhattan", "hamming"),
dist.package = c("distances", "Rfast", "parallelDist", "stats"),
verbose = TRUE,
...
)
AddScoreASWBatch(
object,
integration,
batch.var = NULL,
cell.var = NULL,
what,
per.cell.var = TRUE,
assay = NULL,
metric = c("euclidean", "cosine", "angular", "manhattan", "hamming"),
dist.package = c("distances", "Rfast", "parallelDist", "stats"),
verbose = TRUE,
...
)
Arguments
- object
A Seurat object
- cell.var
The name of the column with cell type label variable (must be in the object metadata). Ignored by
ScoreASWBatch
whenper.cell.var = FALSE
- what
the name of the dimension reduction or layer to score. Must be in the Seurat object or obtainable after a
JoinLayers()
call.- assay
name of the assay to use. The output of
DefaultAssay()
is used by default- metric
name of the distance metric to use. One of 'euclidean', 'cosine', 'angular', 'manhattan', 'hamming'. See Note for details.
- dist.package
name of the package to compute distances with. One of 'distances', 'Rfast' ,'parallelDist', 'stats'. The latter is always available, the others must be installed beforehand. They are ordered from fastest to slowest. When the requested package is not installed, the fastest amongst the available ones is picked. See Note for details.
- verbose
Print messages. Set to
FALSE
to disable- ...
additional parameters to pass to the distance computation functions
- integration
name of the integration to score
- batch.var
The name of the column with batch variable. (must be in the object metadata). Required by
ScoreASWBatch
.- per.cell.var
whether to compute silhouette coefficients with the batch variable for each cell-type separately (default behaviour). Setting to
FALSE
causes the silhouette coefficients to be computed on the whole data directly.
Value
ScoreASW
and ScoreASWBatch
: a single float between 0
and 1, corresponding to the scaled average silhouette score.
AddScoreASW
and AddScoreASWBatch
: the updated Seurat
object
with the ASW score(s) set for the integration.
Details
ScoreASW
: Given a matrix (reduction dimension or layer), the
cell-to-cell distance matrix \(D\) is computed. Then, the silhouette width
\(s(i)\) is calculated for each cell \(i\) with a label \(c \in L\)
(\(L\) is the set of possible cell type labels). Then, the mean of all
\(s(i)\) is computed (i.e. the ASW) and scaled between 0 and 1:
$$\displaystyle ASW = \frac{1}{\left| L \right|} \times \sum_{i}{s(i)} \\[10pt]
score = \frac{ASW + 1}{2}$$
ScoreAWSBatch
: The default parameters (per.cell.var = TRUE
)
correspond to the original score from Luecken M.D. et al., 2022. It is
computed as follow: for each cell type label \(c\) among the set of all
labels \(L\), the cell-to-cell matrix distance \(D_c\) is computed for
cells \(i\) with label \(c\). Each cell's silhouette width \(s(i)\) is
calculated according to the batch variable and transformed to be as close to
1 as its absolute value is close to 0. An average silhouette width
\(ASW_c\) is then computed per cell type label \(c \in L\) and the mean
of those correspond to the final score:
$$\displaystyle ASW_c = \frac{1}{\left| c \right|} \times \sum_{i \in c}{1 - \left| s(i) \right|} \\[10pt]
score = \frac{1}{\left| L \right|} \times \sum_{c \in L}{ASW_c}$$
When per.cell.var = FALSE
, \(ASW\) is computed for all cells at once
(just like for ScoreASW
but on the batch variable), then scaled
and averaged similarly:
$$\displaystyle score = ASW = \frac{1}{N} \times \sum_{i=1}^{N}{1 - \left| s(i) \right|}$$
with \(N\) being the total number of cells
Note
Those scores are an adaptation of the (cell-type) ASW and the batch ASW as described in Luecken M.D. et al., 2022.
Hamming distance is only supported by the parallelDist package, while
distances can only compute euclidean and related distances (cosine and
angular). Angular distances are actually refereed to as 'cosine' in
FindNeighbors()
(annoy.metric
),
hence called 'angular' here. Actual cosine dissimilarity-derived distances
are returned when metric = 'cosine'
. Internally, angular distances are
computed with euclidean distances on \(L^2\) norm. cosine distances are
further transformed with :
$$\displaystyle D_{cosine} = \frac{D_{angular}^2}{2}$$
References
Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI
See also
FindNeighbors
, silhouette
to know more about the silhouette metric, GetNeighborsPerBatch
and GetPropInterBatch