Skip to contents

Compute a score based on adjusted rand index between a clustering result and one or more cell type label variable(s). 0 and 1 reflect a random clustering and a perfect clustering as compared to cell type labelling respectively.

Usage

ScoreARI(object, cell.var, clust.var = "seurat_clusters")

AddScoreARI(object, integration, cell.var, clust.var = "seurat_clusters")

Arguments

object

A Seurat object

cell.var

The name(s) of the column(s) with cell type label variable (must be in the object metadata). Multiple column names are accepted

clust.var

The name of the column with cluster id assignment for each cell (must be in the object metadata). Only one column name is accepted

integration

name of the integration to score

Value

ScoreARI: a named array with as many values as there are common strings between cell.var and the column names of the object's metadata. Names are cell.var and values are ARI.

AddScoreARI: the updated Seurat object with the ARI score(s) set for the integration.

Details

ARI is rand index corrected for chance: $$\displaystyle ARI = \frac{RI - RI_{expected}}{max(RI) - RI_{expected}}$$ More precisely, a contingency table is computed with the two variables \(L\) and \(C\) of \(r\) and \(s\) elements respectively. For \(i \in [\![1,r]\!]\) and \(j \in [\![1,s]\!]\), \(n_{ij}\) is the number of common samples (i.e. cells) between \(L_i\) and \(C_j\), \(a_i\) is the number of samples in \(L_i\) and \(b_j\) is the number of samples in \(C_j\). The ARI is: $$\displaystyle ARI = \frac{\left. \sum_{ij} \binom{n_{ij}}{2} - \left(\sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2}\right) \right/ \binom{n}{2} }{ \left. \frac{1}{2} \left(\sum_i \binom{a_i}{2} + \sum_j \binom{b_j}{2}\right) - \left(\sum_i \binom{a_i}{2} \sum_j \binom{b_j}{2}\right) \right/ \binom{n}{2}}$$

Note

The metric is symmetric. Switching cell.var with clust.var will return the same value.

References

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI