Run scANVI on Seurat's Assay5 object through IntegrateLayers — scANVIIntegration • SeuratIntegrate

A wrapper to run scANVI on multi-layered Seurat V5 object. Requires a conda environment with scvi-tools and necessary dependencies

Recommendations: use raw counts and all features (features = Features(object), layers = "counts")

Usage

scANVIIntegration(
  object,
  groups = NULL,
  groups.name = NULL,
  labels.name = NULL,
  labels.null = NULL,
  features = NULL,
  layers = "counts",
  scale.layer = "scale.data",
  conda_env = NULL,
  new.reduction = "integrated.scANVI",
  reduction.key = "scANVIlatent_",
  torch.intraop.threads = 4L,
  torch.interop.threads = NULL,
  model.save.dir = NULL,
  ndims.out = 10,
  n_hidden = 128L,
  n_layers = 1L,
  dropout_rate = 0.1,
  dispersion = c("gene", "gene-batch", "gene-label", "gene-cell"),
  gene_likelihood = c("zinb", "nb", "poisson"),
  linear_classifier = FALSE,
  max_epochs = NULL,
  train_size = 0.9,
  batch_size = 128L,
  seed.use = 42L,
  verbose = TRUE,
  verbose.scvi = c("INFO", "NOTSET", "DEBUG", "WARNING", "ERROR", "CRITICAL"),
  ...
)

Arguments

object

A Seurat object (or an Assay5 object if not called by IntegrateLayers)

groups

A named data frame with grouping information. Can also contain cell labels to guide scANVI.

groups.name

Column name from groups data frame that stores grouping information. If groups.name = NULL, the first column is used

labels.name

Column name from groups data frame that stores cell label information. If labels.name = NULL, all cells are assigned the same label.

labels.null

One value of groups$labels.name that indicates unlabeled observations. labels.null = NULL means all labels are valid. Only applies when labels.name != NULL.

features

Vector of feature names to input to the integration method. When features = NULL (default), the VariableFeatures are used. To pass all features, use the output of Features()

layers

Name of the layers to use in the integration. 'counts' is highly recommended

scale.layer

Name of the scaled layer in Assay

conda_env

Path to conda environment to run scANVI (should also contain the scipy python module). By default, uses the conda environment registered for scANVI in the conda environment manager

new.reduction

Name of the new integrated dimensional reduction

reduction.key

Key for the new integrated dimensional reduction

torch.intraop.threads

Number of intra-op threads available to torch when training on CPU instead of GPU. Set via torch.set_num_threads().

torch.interop.threads

Number of intra-op threads available to torch when training on CPU instead of GPU. Set via torch.set_num_interop_threads(). Can only be changed once, on first call.

model.save.dir

Path to a directory to save the model to. Uses SCANVI.save(). Does not save anndata. Note that neither the trainer optimizer state nor the trainer history are saved. model.save.dir = NULL (default) disables saving the model.

ndims.out

Number of dimensions for new.reduction output. Corresponds to n_latent argument in the original API of SCANVI

n_hidden

Number of nodes per hidden layer.

n_layers

Number of hidden layers used for encoder and decoder NNs.

dropout_rate

Dropout rate for neural networks.

dispersion

One of the following:

gene: dispersion parameter of NB is constant per gene across cells (default)
gene-batch: dispersion can differ between different batches
gene-label: dispersion can differ between different labels
gene-cell: dispersion can differ for every gene in every cell

gene_likelihood

One of the following:

zinb: Zero-inflated negative binomial distribution (default)
nb: Negative binomial distribution
poisson: Poisson distribution

linear_classifier

When switched to TRUE, uses a single linear layer for classification instead of a multi-layer perceptron.

max_epochs

Number of passes through the dataset for semisupervised training.

train_size

Size of training set in the range [0.0, 1.0]

batch_size

Minibatch size to use during training.

seed.use

An integer to generate reproducible outputs. Set seed.use = NULL to disable

verbose

Print messages. Set to FALSE to disable

verbose.scvi

Verbosity level of scANVI. From quietest to talkiest: CRITICAL, ERROR, WARNING, INFO (default), DEBUG, NOTSET

...

Additional arguments to be passed to scvi.model.SCANVI, SCANVI.setup_anndata or SCANVI.train (see Details section)

Value

A list containing:

a new DimReduc of name new.reduction (key set to reduction.key) consisting of the latent space of the model with ndims.out dimensions.

When called via IntegrateLayers, a Seurat object with the new reduction and/or assay is returned

Details

This wrappers calls three python functions through reticulate. Find the scVANVI-specific arguments there:

model initiation: scvi.model.SCANVI, which relies on scvi.module.SCANVAE which in turn relies on scvi.module.VAE
anndata setup: SCANVI.setup_anndata
training: SCANVI.train

Note

This function requires the scvi-tools package to be installed (along with scipy)

References

Kingma, D. P., Rezende, D. J., Mohamed, S. & Welling, M. Semi- Supervised Learning with Deep Generative Models. Preprint at arXiv (2014). DOI

Xu, C., Lopez, R., Mehlman, E., Regier, J., Jordan, M. I. & Yosef, N. Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models. Molecular Systems Biology 17, (2021). DOI

Examples

if (FALSE) { # \dontrun{
# Preprocessing
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)

# After preprocessing, we integrate layers:
obj <- IntegrateLayers(object = obj, method = scANVIIntegration,
                       features = Features(obj), conda_env = 'scvi-tools',
                       layers = 'counts', groups = obj[[]], groups.name = 'Method',
                       labels.name = 'CellType', labels.null = 'Unassigned')

# To enable saving the model, add other 'nuisance' factors and increase number of threads used:
obj <- IntegrateLayers(object = obj, method = scANVIIntegration,
                       features = Features(obj), conda_env = 'scvi-tools',
                       layers = 'counts', groups = obj[[]], groups.name = "Method",
                       labels.name = "CellType", labels.null = "Unassigned",
                       categorical_covariate_keys = "Experiment",
                       continuous_covariate_keys = "percent.mito",
                       ncores = 8, model.save.dir = '~/Documents/scANVI.model')
} # }

Run scANVI on Seurat's Assay5 object through `IntegrateLayers`