Score a knn graph based on cell-type label connectivity

Compute scores based on the connectivity between cells sharing the same label. The score can be calculated in two fashions, either by quantifying the connectivity of the largest subgraph for each cell label (identical to the score used in Luecken M.D. et al., 2022), or directly on the whole graph.

Usage

ScoreConnectivity(
  object,
  graph.name,
  cell.var,
  do.symmetrize = TRUE,
  per.component = TRUE,
  count.self = FALSE,
  weight.by.ncells = FALSE
)

AddScoreConnectivity(
  object,
  integration,
  graph.name,
  cell.var,
  do.symmetrize = TRUE,
  per.component = TRUE,
  count.self = FALSE,
  weight.by.ncells = FALSE
)

Arguments

object: A Seurat object
graph.name: The name of the knn graph to score.
cell.var: The name of the cell variable (must be in the object metadata).
do.symmetrize: whether to symmetrize the knn graph. Set toFALSE to disable (not recommended, see Details)
per.component: whether to use the same score as in Luecken M.D. et al., 2022. TRUE by default. See Details.
count.self: whether to account for loops (i.e. cells connected to themselves). FALSE by default
weight.by.ncells: whether to weight the connectivity-derived scores computed on each cell type label by their relative proportion in number of cells. By default (FALSE), the overall score is computed as the mean of scores computed per label. Ignored when per.component = TRUE
integration: name of the integration to score

Value

ScoreConnectivity: a single float between 0 and 1, corresponding to the connectivity score.

AddScoreConnectivity: the updated Seurat object with the Graph connectivity score set for the integration.

Details

The default parameters (per.component = TRUE, count.self = FALSE) correspond to the original score from Luecken M.D. et al., 2022. It is computed as follow: for each cell type label $c$ among the set of all labels $L$, the sub graph $subG_c$ exclusively composed of cells $c$ is extracted from the full graph $G$. Then, the size (i.e. the number of cells, hence of vertices $V$) of the largest connected component is divided by the size of $subG_c$. Then, the mean of sub graphs' scores is computed: $$ratio_c = \frac{max(\left|V(CC(subG_c))\right|)}{\left|V(subG_c)\right|}$$ $$score = \frac{1}{\left| L \right|}\sum_{c \in L} ratio_c$$

When per.component = FALSE, the connectivity is computed on the full graph $G$. Let's consider the set of all labels $L$, $c \in L$ and $L\prime = L \setminus \{c\}$. Let's also denote the edges between cells $\in \{c\}$ $E_c = E_{c \to c}(G)$ and the edges connecting cells $\in \{c\}$ with cells $\in L\prime$ $E_c^{\prime} = E_{c \to L\prime}(G)$. When weight.by.ncells = TRUE, the score is computed as follow: $$score = \sum_{c \in L} \left( \frac{\left| V(subG_c) \right|}{\left| V(G) \right|} \times \frac{\left| E_c \right|}{\left| E_c^\prime \right| + \left| E_c \right|}\right)$$

When weight.by.ncells = FALSE, the score is the mean of ratio of edges: $$score = \frac{1}{\left| L \right|}\sum_{c \in L} \frac{\left| E_c \right|}{\left| E_c^\prime \right| + \left| E_c \right|}$$

In either case, it is recommended to keep do.symmetrize = TRUE.

Note

This score is an adaptation of the graph connectivity score as described in Luecken M.D. et al., 2022.

References