
Run scANVI on Seurat's Assay5 object through IntegrateLayers
      Source: R/scANVI.R
      scANVIIntegration.RdA wrapper to run scANVI on multi-layered Seurat V5 object.
Requires a conda environment with scvi-tools and necessary dependencies
Recommendations: use raw counts and all features
(features = Features(object), layers = "counts")
Usage
scANVIIntegration(
  object,
  groups = NULL,
  groups.name = NULL,
  labels.name = NULL,
  labels.null = NULL,
  features = NULL,
  layers = "counts",
  scale.layer = "scale.data",
  conda_env = NULL,
  new.reduction = "integrated.scANVI",
  reduction.key = "scANVIlatent_",
  torch.intraop.threads = 4L,
  torch.interop.threads = NULL,
  model.save.dir = NULL,
  ndims.out = 10,
  n_hidden = 128L,
  n_layers = 1L,
  dropout_rate = 0.1,
  dispersion = c("gene", "gene-batch", "gene-label", "gene-cell"),
  gene_likelihood = c("zinb", "nb", "poisson"),
  linear_classifier = FALSE,
  max_epochs = NULL,
  train_size = 0.9,
  batch_size = 128L,
  seed.use = 42L,
  verbose = TRUE,
  verbose.scvi = c("INFO", "NOTSET", "DEBUG", "WARNING", "ERROR", "CRITICAL"),
  ...
)Arguments
- object
- A - Seuratobject (or an- Assay5object if not called by- IntegrateLayers)
- groups
- A named data frame with grouping information. Can also contain cell labels to guide scANVI. 
- groups.name
- Column name from - groupsdata frame that stores grouping information. If- groups.name = NULL, the first column is used
- labels.name
- Column name from - groupsdata frame that stores cell label information. If- labels.name = NULL, all cells are assigned the same label.
- labels.null
- One value of - groups$labels.namethat indicates unlabeled observations.- labels.null = NULLmeans all labels are valid. Only applies when- labels.name != NULL.
- features
- Vector of feature names to input to the integration method. When - features = NULL(default), the- VariableFeaturesare used. To pass all features, use the output of- Features()
- layers
- Name of the layers to use in the integration. 'counts' is highly recommended 
- scale.layer
- Name of the scaled layer in - Assay
- conda_env
- Path to conda environment to run scANVI (should also contain the scipy python module). By default, uses the conda environment registered for scANVI in the conda environment manager 
- new.reduction
- Name of the new integrated dimensional reduction 
- reduction.key
- Key for the new integrated dimensional reduction 
- torch.intraop.threads
- Number of intra-op threads available to torch when training on CPU instead of GPU. Set via - torch.set_num_threads().
- torch.interop.threads
- Number of intra-op threads available to torch when training on CPU instead of GPU. Set via - torch.set_num_interop_threads(). Can only be changed once, on first call.
- model.save.dir
- Path to a directory to save the model to. Uses - SCANVI.save(). Does not save anndata. Note that neither the trainer optimizer state nor the trainer history are saved.- model.save.dir = NULL(default) disables saving the model.
- ndims.out
- Number of dimensions for - new.reductionoutput. Corresponds to- n_latentargument in the original API of SCANVI
- Number of nodes per hidden layer. 
- n_layers
- Number of hidden layers used for encoder and decoder NNs. 
- dropout_rate
- Dropout rate for neural networks. 
- dispersion
- One of the following: - gene: dispersion parameter of NB is constant per gene across cells (default)
- gene-batch: dispersion can differ between different batches
- gene-label: dispersion can differ between different labels
- gene-cell: dispersion can differ for every gene in every cell
 
- gene_likelihood
- One of the following: - zinb: Zero-inflated negative binomial distribution (default)
- nb: Negative binomial distribution
- poisson: Poisson distribution
 
- linear_classifier
- When switched to - TRUE, uses a single linear layer for classification instead of a multi-layer perceptron.
- max_epochs
- Number of passes through the dataset for semisupervised training. 
- train_size
- Size of training set in the range - [0.0, 1.0]
- batch_size
- Minibatch size to use during training. 
- seed.use
- An integer to generate reproducible outputs. Set - seed.use = NULLto disable
- verbose
- Print messages. Set to - FALSEto disable
- verbose.scvi
- Verbosity level of scANVI. From quietest to talkiest: CRITICAL, ERROR, WARNING, INFO (default), DEBUG, NOTSET 
- ...
- Additional arguments to be passed to - scvi.model.SCANVI,- SCANVI.setup_anndataor- SCANVI.train(see Details section)
Value
A list containing:
- a new DimReduc of name - new.reduction(key set to- reduction.key) consisting of the latent space of the model with- ndims.outdimensions.
When called via IntegrateLayers, a Seurat object with
the new reduction and/or assay is returned
Details
This wrappers calls three python functions through reticulate. Find the scVANVI-specific arguments there:
- model initiation: scvi.model.SCANVI, which relies on scvi.module.SCANVAE which in turn relies on scvi.module.VAE 
- anndata setup: SCANVI.setup_anndata 
- training: SCANVI.train 
Note
This function requires the scvi-tools package to be installed (along with scipy)
References
Kingma, D. P., Rezende, D. J., Mohamed, S. & Welling, M. Semi- Supervised Learning with Deep Generative Models. Preprint at arXiv (2014). DOI
Xu, C., Lopez, R., Mehlman, E., Regier, J., Jordan, M. I. & Yosef, N. Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models. Molecular Systems Biology 17, (2021). DOI
Examples
if (FALSE) { # \dontrun{
# Preprocessing
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
# After preprocessing, we integrate layers:
obj <- IntegrateLayers(object = obj, method = scANVIIntegration,
                       features = Features(obj), conda_env = 'scvi-tools',
                       layers = 'counts', groups = obj[[]], groups.name = 'Method',
                       labels.name = 'CellType', labels.null = 'Unassigned')
# To enable saving the model, add other 'nuisance' factors and increase number of threads used:
obj <- IntegrateLayers(object = obj, method = scANVIIntegration,
                       features = Features(obj), conda_env = 'scvi-tools',
                       layers = 'counts', groups = obj[[]], groups.name = "Method",
                       labels.name = "CellType", labels.null = "Unassigned",
                       categorical_covariate_keys = "Experiment",
                       continuous_covariate_keys = "percent.mito",
                       ncores = 8, model.save.dir = '~/Documents/scANVI.model')
} # }