Score a knn graph based on cell-type label connectivity
Source:R/metrics_connectivity.R
score-connectivity.Rd
Compute scores based on the connectivity between cells sharing the same label. The score can be calculated in two fashions, either by quantifying the connectivity of the largest subgraph for each cell label (identical to the score used in Luecken M.D. et al., 2022), or directly on the whole graph.
Usage
ScoreConnectivity(
object,
graph.name,
cell.var,
do.symmetrize = TRUE,
per.component = TRUE,
count.self = FALSE,
weight.by.ncells = FALSE
)
AddScoreConnectivity(
object,
integration,
graph.name,
cell.var,
do.symmetrize = TRUE,
per.component = TRUE,
count.self = FALSE,
weight.by.ncells = FALSE
)
Arguments
- object
A Seurat object
- graph.name
The name of the knn graph to score.
- cell.var
The name of the cell variable (must be in the object metadata).
- do.symmetrize
whether to symmetrize the knn graph. Set to
FALSE
to disable (not recommended, see Details)- per.component
whether to use the same score as in Luecken M.D. et al., 2022.
TRUE
by default. See Details.- count.self
whether to account for loops (i.e. cells connected to themselves).
FALSE
by default- weight.by.ncells
whether to weight the connectivity-derived scores computed on each cell type label by their relative proportion in number of cells. By default (
FALSE
), the overall score is computed as the mean of scores computed per label. Ignored whenper.component = TRUE
- integration
name of the integration to score
Value
ScoreConnectivity
: a single float between 0 and 1,
corresponding to the connectivity score.
AddScoreConnectivity
: the updated Seurat object
with the Graph
connectivity score set for the integration.
Details
The default parameters (per.component = TRUE, count.self = FALSE
)
correspond to the original score from Luecken M.D. et al., 2022. It is
computed as follow: for each cell type label \(c\) among the set of all
labels \(L\), the sub graph \(subG_c\) exclusively composed of cells
\(c\) is extracted from the full graph \(G\). Then, the size (i.e. the
number of cells, hence of vertices \(V\)) of the largest connected
component is divided by the size of \(subG_c\). Then, the mean of sub
graphs' scores is computed:
$$ratio_c = \frac{max(\left|V(CC(subG_c))\right|)}{\left|V(subG_c)\right|}$$
$$score = \frac{1}{\left| L \right|}\sum_{c \in L} ratio_c$$
When per.component = FALSE
, the connectivity is computed on the full
graph \(G\). Let's consider the set of all labels \(L\),
\(c \in L\) and \(L\prime = L \setminus \{c\}\). Let's also denote the
edges between cells \(\in \{c\}\) \(E_c = E_{c \to c}(G)\)
and the edges connecting cells \(\in \{c\}\) with cells \(\in L\prime\)
\(E_c^{\prime} = E_{c \to L\prime}(G)\).
When weight.by.ncells = TRUE
, the score is computed as follow:
$$score = \sum_{c \in L} \left( \frac{\left| V(subG_c) \right|}{\left| V(G) \right|} \times \frac{\left| E_c \right|}{\left| E_c^\prime \right| + \left| E_c \right|}\right)$$
When weight.by.ncells = FALSE
, the score is the mean of ratio of edges:
$$score = \frac{1}{\left| L \right|}\sum_{c \in L} \frac{\left| E_c \right|}{\left| E_c^\prime \right| + \left| E_c \right|}$$
In either case, it is recommended to keep do.symmetrize = TRUE
.
Note
This score is an adaptation of the graph connectivity score as described in Luecken M.D. et al., 2022.
References
Luecken, M. D., Büttner, M., Chaichoompu, K., Danese, A., Interlandi, M., Mueller, M. F., Strobl, D. C., Zappia, L., Dugas, M., Colomé-Tatché, M. & Theis, F. J. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19, 41–50 (2021). DOI