Skip to contents

A wrapper to run scVI on multi-layered Seurat V5 object. Requires a conda environment with scvi-tools and necessary dependencies

Can be called via SeuratIntegrate::scVIIntegration() or scVIIntegration.fix()

Recommendations: use raw counts and all features (features = Features(object), layers = "counts")

Usage

scVIIntegration(
  object,
  groups = NULL,
  groups.name = NULL,
  labels.name = NULL,
  features = NULL,
  layers = "counts",
  scale.layer = "scale.data",
  conda_env = NULL,
  new.reduction = "integrated.scVI",
  reduction.key = "scVIlatent_",
  torch.intraop.threads = 4L,
  torch.interop.threads = NULL,
  model.save.dir = NULL,
  ndims.out = 10,
  n_hidden = 128L,
  n_layers = 1L,
  dropout_rate = 0.1,
  dispersion = c("gene", "gene-batch", "gene-label", "gene-cell"),
  gene_likelihood = c("zinb", "nb", "poisson"),
  latent_distribution = c("normal", "ln"),
  max_epochs = NULL,
  train_size = 0.9,
  batch_size = 128L,
  seed.use = 42L,
  verbose = TRUE,
  verbose.scvi = c("INFO", "NOTSET", "DEBUG", "WARNING", "ERROR", "CRITICAL"),
  ...
)

scVIIntegration.fix(...)

Arguments

object

A Seurat object (or an Assay5 object if not called by IntegrateLayers)

groups

A named data frame with grouping information.

groups.name

Column name from groups data frame that stores grouping information. If groups.name = NULL, the first column is used

labels.name

Column name from groups data frame that stores cell label information. If labels.name = NULL, all cells are assigned the same label.

features

Vector of feature names to input to the integration method. When features = NULL (default), the VariableFeatures are used. To pass all features, use the output of Features()

layers

Name of the layers to use in the integration. 'counts' is highly recommended

scale.layer

Name of the scaled layer in Assay

conda_env

Path to conda environment to run scVI (should also contain the scipy python module). By default, uses the conda environment registered for scVI in the conda environment manager

new.reduction

Name of the new integrated dimensional reduction

reduction.key

Key for the new integrated dimensional reduction

torch.intraop.threads

Number of intra-op threads available to torch when training on CPU instead of GPU. Set via torch.set_num_threads().

torch.interop.threads

Number of intra-op threads available to torch when training on CPU instead of GPU. Set via torch.set_num_interop_threads(). Can only be changed once, on first call.

model.save.dir

Path to a directory to save the model to. Uses SCVI.save(). Does not save anndata. Note that neither the trainer optimizer state nor the trainer history are saved. model.save.dir = NULL (default) disables saving the model.

ndims.out

Number of dimensions for new.reduction output. Corresponds to n_latent argument in the original API of SCVI

n_hidden

Number of nodes per hidden layer.

n_layers

Number of hidden layers used for encoder and decoder NNs.

dropout_rate

Dropout rate for neural networks.

dispersion

One of the following:

  • gene: dispersion parameter of NB is constant per gene across cells (default)

  • gene-batch: dispersion can differ between different batches

  • gene-label: dispersion can differ between different labels

  • gene-cell: dispersion can differ for every gene in every cell

gene_likelihood

One of the following:

  • zinb: Zero-inflated negative binomial distribution (default)

  • nb: Negative binomial distribution

  • poisson: Poisson distribution

latent_distribution

One of the following:

  • normal: Normal distribution (default)

  • ln: Logistic normal distribution (Normal(0, I) transformed by softmax)

max_epochs

Number of passes through the dataset for semisupervised training.

train_size

Size of training set in the range [0.0, 1.0]

batch_size

Minibatch size to use during training.

seed.use

An integer to generate reproducible outputs. Set seed.use = NULL to disable

verbose

Print messages. Set to FALSE to disable

verbose.scvi

Verbosity level of scVI. From quietest to talkiest: CRITICAL, ERROR, WARNING, INFO (default), DEBUG, NOTSET

...

For scVIIntegration(), additional arguments to be passed to scvi.model.SCVI, SCVI.setup_anndata or SCVI.train (see Details section). For scVIIntegration.fix(), all of the above

Value

A list containing:

  • a new DimReduc of name new.reduction (key set to reduction.key) consisting of the latent space of the model with ndims.out dimensions.

When called via IntegrateLayers, a Seurat object with the new reduction and/or assay is returned

Details

This wrappers calls three python functions through reticulate. Find the scVI-specific arguments there:

Note

This function requires the scvi-tools package to be installed (along with scipy)

References

Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat Methods 15, 1053–1058 (2018). DOI

Examples

if (FALSE) { # \dontrun{
# Preprocessing
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)

# After preprocessing, we integrate layers:
obj <- IntegrateLayers(object = obj, method = scVIIntegration,
                       features = Features(obj), conda_env = 'scvi-tools',
                       layers = 'counts', groups = obj[[]], groups.name = 'Method')

# To enable cell label-guided correction, save the model, add other
# 'nuisance' factors and increase number of threads used:
obj <- IntegrateLayers(object = obj, method = scVIIntegration,
                       features = Features(obj), conda_env = 'scvi-tools',
                       layers = 'counts', groups = obj[[]], groups.name = "Method",
                       labels.name = "CellType",
                       categorical_covariate_keys = list("Experiment"),
                       continuous_covariate_keys = list("percent.mito"),
                       ncores = 8, model.save.dir = '~/Documents/scVI.model')
} # }