Compute a score based on normalised mutual information between a clustering result and one or more cell type label variable(s). 0 and 1 reflect an absence of mutual information and a perfect correlation respectively.
Arguments
- object
A Seurat object
- cell.var
The name(s) of the column(s) with cell type label variable (must be in the object metadata). Multiple column names are accepted
- clust.var
The name of the column with cluster id assignment for each cell (must be in the object metadata). Only one column name is accepted
- average.entropy
method to compute the value of the normalisation denominator from each variable's entropy. one of 'mean', 'geom', 'min' and 'max', namely 'arithmetic mean of', 'geometric mean of', 'minimum' and 'maximum' entropy respectively.
- integration
name of the integration to score
Value
ScoreNMI
: a named array with as many values as there are
common strings between cell.var and the column names of the object's
metadata. Names are cell.var and values are NMI scores.
AddScoreNMI
: the updated Seurat object
with the NMI score(s)
set for the integration.
Details
Considering a \(N\)-cells dataset, with \(\left|L_i\right|\) the number of cells labelled with cell type \(L_i\) and \(\left|C_i\right|\) the number of cells in cluster \(C_i\). The discrete mutual information \(MI\) approximation is given by: $$\displaystyle MI(L, C) = \sum_{i=1}^{\left|L\right|}\sum_{j=1}^{\left|C\right|} \left( \frac{\left|L_i \cap C_j\right|}{N} \times log \left(\frac{N \times \left|L_i \cap C_j\right|}{\left|L_i\right| \left|C_j\right|} \right) \right)$$ Then, \(MI\) is normalised (scaled) by a denominator, which is computed by applying a function \(f\) on both variables' entropies (\(H\)). \(f\) can either be the arithmetic mean, geometric mean, maximum or minimum of entropies. $$\displaystyle NMI(L, C) = \frac{MI(L, C)}{f(H(L), H(C))}$$
References
Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI