Skip to contents

Compute scores based on the connectivity between cells sharing the same label. The score can be calculated in two fashions, either by quantifying the connectivity of the largest subgraph for each cell label (identical to the score used in Luecken M.D. et al., 2022), or directly on the whole graph.

Usage

ScoreConnectivity(
  object,
  graph.name,
  cell.var,
  do.symmetrize = TRUE,
  per.component = TRUE,
  count.self = FALSE,
  weight.by.ncells = FALSE
)

AddScoreConnectivity(
  object,
  integration,
  graph.name,
  cell.var,
  do.symmetrize = TRUE,
  per.component = TRUE,
  count.self = FALSE,
  weight.by.ncells = FALSE
)

Arguments

object

A Seurat object

graph.name

The name of the knn graph to score.

cell.var

The name of the cell variable (must be in the object metadata).

do.symmetrize

whether to symmetrize the knn graph. Set toFALSE to disable (not recommended, see Details)

per.component

whether to use the same score as in Luecken M.D. et al., 2022. TRUE by default. See Details.

count.self

whether to account for loops (i.e. cells connected to themselves). FALSE by default

weight.by.ncells

whether to weight the connectivity-derived scores computed on each cell type label by their relative proportion in number of cells. By default (FALSE), the overall score is computed as the mean of scores computed per label. Ignored when per.component = TRUE

integration

name of the integration to score

Value

ScoreConnectivity: a single float between 0 and 1, corresponding to the connectivity score.

AddScoreConnectivity: the updated Seurat object with the Graph connectivity score set for the integration.

Details

The default parameters (per.component = TRUE, count.self = FALSE) correspond to the original score from Luecken M.D. et al., 2022. It is computed as follow: for each cell type label \(c\) among the set of all labels \(L\), the sub graph \(subG_c\) exclusively composed of cells \(c\) is extracted from the full graph \(G\). Then, the size (i.e. the number of cells, hence of vertices \(V\)) of the largest connected component is divided by the size of \(subG_c\). Then, the mean of sub graphs' scores is computed: $$ratio_c = \frac{max(\left|V(CC(subG_c))\right|)}{\left|V(subG_c)\right|}$$ $$score = \frac{1}{\left| L \right|}\sum_{c \in L} ratio_c$$

When per.component = FALSE, the connectivity is computed on the full graph \(G\). Let's consider the set of all labels \(L\), \(c \in L\) and \(L\prime = L \setminus \{c\}\). Let's also denote the edges between cells \(\in \{c\}\) \(E_c = E_{c \to c}(G)\) and the edges connecting cells \(\in \{c\}\) with cells \(\in L\prime\) \(E_c^{\prime} = E_{c \to L\prime}(G)\). When weight.by.ncells = TRUE, the score is computed as follow: $$score = \sum_{c \in L} \left( \frac{\left| V(subG_c) \right|}{\left| V(G) \right|} \times \frac{\left| E_c \right|}{\left| E_c^\prime \right| + \left| E_c \right|}\right)$$

When weight.by.ncells = FALSE, the score is the mean of ratio of edges: $$score = \frac{1}{\left| L \right|}\sum_{c \in L} \frac{\left| E_c \right|}{\left| E_c^\prime \right| + \left| E_c \right|}$$

In either case, it is recommended to keep do.symmetrize = TRUE.

Note

This score is an adaptation of the graph connectivity score as described in Luecken M.D. et al., 2022.

References

Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI

See also