Run bbknn on Seurat's Assay5 object through IntegrateLayers

A wrapper to run bbknn on multi-layered Seurat V5 object. Requires a conda environment with bbknn and necessary dependencies

Usage

bbknnIntegration(
  object,
  orig,
  groups = NULL,
  groups.name = NULL,
  layers = "data",
  scale.layer = "scale.data",
  conda_env = NULL,
  new.graph = "bbknn",
  new.reduction = "pca.bbknn",
  reduction.key = "bbknnPCA_",
  reconstructed.assay = "bbknn.ridge",
  ndims = 50L,
  ndims.use = 30L,
  ridge_regression = T,
  graph.use = c("connectivities", "distances"),
  verbose = TRUE,
  seed.use = 42L,
  ...
)

Arguments

object: A Seurat object (or an Assay5 object if not called by IntegrateLayers)
orig: DimReduc object. Not to be set directly when called with IntegrateLayers, use orig.reduction argument instead
groups: A named data frame with grouping information. Preferably one-column when groups.name = NULL
groups.name: Column name from groups data frame that stores grouping information. If groups.name = NULL, the first column is used
layers: Name of the layers to use in the integration
scale.layer: Name of the scaled layer in Assay
conda_env: Path to conda environment to run bbknn (should also contain the scipy python module). By default, uses the conda environment registered for bbknn in the conda environment manager
new.graph: Name of the Graph object
new.reduction: Name of the new integrated dimensional reduction
reduction.key: Key for the new integrated dimensional reduction
reconstructed.assay: Name for the assay containing the corrected expression matrix
ndims: Number of dimensions for the new PCA computed on first output of bbknn. 50 by default. Ignored when ridge_regression = FALSE
ndims.use: Number of dimensions from orig to use for bbknn, and from newly computed PCA when ridge_regression = TRUE.
ridge_regression: When set to TRUE (default), new clusters are computed on the output of bbknn, then a ridge regression is performed to remove technical variables while preserving biological variables. Then, a new bbknn run is performed.
graph.use: Which graph(s) of bbknn to output. At least one of "connectivities" or "distances". If both are provided (default) and ridge_regression = TRUE, the first one ("connectivities" by default, recommended) is used as input for computing clusters.
verbose: Print messages. Set to FALSE to disable
seed.use: An integer to generate reproducible outputs. Set seed.use = NULL to disable
...: Additional arguments to be passed to bbknn.bbknn(). When ridge_regression = TRUE, also accepts arguments to pass to Seurat::FindClusters(), Seurat::RunPCA() and bbknn.ridge_regression(). See Details section

Value

A list containing at least one of:

1 or 2 new Graph(s) of name [new_graph]_scale.data_[graph.use] corresponding to the output(s) of the first run of bbknn
a new Assay of name reconstructed.assay with corrected counts for each feature from scale.layer.
a new DimReduc (PCA) of name new.reduction (key set to reduction.key)
1 or 2 new Graph(s) of name [new_graph]_ridge.residuals_[graph.use] corresponding to the output(s) of the second run of bbknn

[graph.use] can take two values (either "connectivities" or "distances"), depending on the graph.use parameter.

When called via IntegrateLayers, a Seurat object with the new reduction and/or assay is returned

Details

This wrappers calls three python functions through reticulate. Find the bbknn-specific arguments there:

bbknn function: bbknn.bbknn, which relies on bbknn.matrix.bbknn
ridge regression: bbknn.ridge_regression, which relies on sklearn.linear_model.Ridge

Note

This function requires the bbknn package to be installed (along with scipy)

References

Polański, K., Young, M. D., Miao, Z., Meyer, K. B., Teichmann, S. A. & Park, J.-E. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics 36, 964–965 (2019). DOI

Examples

if (FALSE) { # \dontrun{
# Preprocessing
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)

# After preprocessing, we integrate layers:
obj <- IntegrateLayers(object = obj, method = bbknnIntegration,
                       conda_env = 'bbknn', groups = obj[[]],
                       groups.name = 'Method')

# To disable the ridge regression and subsequent steps:
obj <- IntegrateLayers(object = obj, method = bbknnIntegration,
                       conda_env = 'bbknn', groups = obj[[]],
                       groups.name = 'Method', ridge_regression = FALSE)
} # }

Run bbknn on Seurat's Assay5 object through `IntegrateLayers`