Skip to contents

SeuratIntegrate incorporates 11 scoring metrics: 6 quantify the degree of batch mixing batch correction (batch correction), while 5 assess the preservation of biological differences bio-conservation (bio-conservation) based on ground truth cell type labels.

Below is a table summarising each score’s input and type:

Table summarising the inputs required for each score, and the type of scores it belongs to.
Score name Require a cell type variable Require a clustering variable Input Score type
Cell cycle regression Dimension reduction bio-conservation
PCA regression Dimension reduction batch correction
PCA density Dimension reduction batch correction
ASW batch cell-type variable Dimension reduction batch correction
ASW cell-type variable Dimension reduction bio-conservation
ARI cell-type variable clustering variable bio-conservation
NMI cell-type variable clustering variable bio-conservation
cLISI cell-type variable Dimension reduction or KNN graph bio-conservation
iLISI cell-type variable Dimension reduction or KNN graph batch correction
kBET cell-type variable Dimension reduction or KNN graph batch correction
Graph connectivity cell-type variable (per.component = TRUE) KNN graph batch correction

Most scores are computed on an embedding DimReduc object (Seurat::DimReduc object) or a graph Graph or Neighbor object (Seurat::Neighbor or Seurat::Graph object). The exceptions are ARI and NMI, which compare two categorical variables thus don’t need anything else than a cell-type and a cluster assignment variables.d anything else than a cell-type and a cluster assignment variables.

Most scores are based on a cell type label variable. This consists in an estimate of each cell’s type obtained by analysing each batch separately or by using an automatic cell annotation algorithm. This estimate of cell types must be of sufficient quality to be considered suitable as ground truth.