Skip to contents

Compute a score based on normalised mutual information between a clustering result and one or more cell type label variable(s). 0 and 1 reflect an absence of mutual information and a perfect correlation respectively.

Usage

ScoreNMI(
  object,
  cell.var,
  clust.var = "seurat_clusters",
  average.entropy = c("mean", "geom", "min", "max")
)

AddScoreNMI(
  object,
  integration,
  cell.var,
  clust.var = "seurat_clusters",
  average.entropy = c("mean", "geom", "min", "max")
)

Arguments

object

A Seurat object

cell.var

The name(s) of the column(s) with cell type label variable (must be in the object metadata). Multiple column names are accepted

clust.var

The name of the column with cluster id assignment for each cell (must be in the object metadata). Only one column name is accepted

average.entropy

method to compute the value of the normalisation denominator from each variable's entropy. one of 'mean', 'geom', 'min' and 'max', namely 'arithmetic mean of', 'geometric mean of', 'minimum' and 'maximum' entropy respectively.

integration

name of the integration to score

Value

ScoreNMI: a named array with as many values as there are common strings between cell.var and the column names of the object's metadata. Names are cell.var and values are NMI scores.

AddScoreNMI: the updated Seurat object with the NMI score(s) set for the integration.

Details

Considering a \(N\)-cells dataset, with \(\left|L_i\right|\) the number of cells labelled with cell type \(L_i\) and \(\left|C_i\right|\) the number of cells in cluster \(C_i\). The discrete mutual information \(MI\) approximation is given by: $$\displaystyle MI(L, C) = \sum_{i=1}^{\left|L\right|}\sum_{j=1}^{\left|C\right|} \left( \frac{\left|L_i \cap C_j\right|}{N} \times log \left(\frac{N \times \left|L_i \cap C_j\right|}{\left|L_i\right| \left|C_j\right|} \right) \right)$$ Then, \(MI\) is normalised (scaled) by a denominator, which is computed by applying a function \(f\) on both variables' entropies (\(H\)). \(f\) can either be the arithmetic mean, geometric mean, maximum or minimum of entropies. $$\displaystyle NMI(L, C) = \frac{MI(L, C)}{f(H(L), H(C))}$$

Note

The metric is symmetric. Switching cell.var with clust.var will return the same value.

References

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI