A wrapper to run scVI on multi-layered Seurat V5 object.
Requires a conda environment with scvi-tools and necessary dependencies
Can be called via SeuratIntegrate::scVIIntegration() or
scVIIntegration.fix()
Recommendations: use raw counts and all features
(features = Features(object), layers = "counts")
Usage
scVIIntegration(
object,
groups = NULL,
groups.name = NULL,
labels.name = NULL,
features = NULL,
layers = "counts",
scale.layer = "scale.data",
conda_env = NULL,
new.reduction = "integrated.scVI",
reduction.key = "scVIlatent_",
torch.intraop.threads = 4L,
torch.interop.threads = NULL,
model.save.dir = NULL,
ndims.out = 10,
n_hidden = 128L,
n_layers = 1L,
dropout_rate = 0.1,
dispersion = c("gene", "gene-batch", "gene-label", "gene-cell"),
gene_likelihood = c("zinb", "nb", "poisson"),
latent_distribution = c("normal", "ln"),
max_epochs = NULL,
train_size = 0.9,
batch_size = 128L,
seed.use = 42L,
verbose = TRUE,
verbose.scvi = c("INFO", "NOTSET", "DEBUG", "WARNING", "ERROR", "CRITICAL"),
...
)
scVIIntegration.fix(...)Arguments
- object
A
Seuratobject (or anAssay5object if not called byIntegrateLayers)- groups
A named data frame with grouping information.
- groups.name
Column name from
groupsdata frame that stores grouping information. Ifgroups.name = NULL, the first column is used- labels.name
Column name from
groupsdata frame that stores cell label information. Iflabels.name = NULL, all cells are assigned the same label.- features
Vector of feature names to input to the integration method. When
features = NULL(default), theVariableFeaturesare used. To pass all features, use the output ofFeatures()- layers
Name of the layers to use in the integration. 'counts' is highly recommended
- scale.layer
Name of the scaled layer in
Assay- conda_env
Path to conda environment to run scVI (should also contain the scipy python module). By default, uses the conda environment registered for scVI in the conda environment manager
- new.reduction
Name of the new integrated dimensional reduction
- reduction.key
Key for the new integrated dimensional reduction
- torch.intraop.threads
Number of intra-op threads available to torch when training on CPU instead of GPU. Set via
torch.set_num_threads().- torch.interop.threads
Number of intra-op threads available to torch when training on CPU instead of GPU. Set via
torch.set_num_interop_threads(). Can only be changed once, on first call.- model.save.dir
Path to a directory to save the model to. Uses
SCVI.save(). Does not save anndata. Note that neither the trainer optimizer state nor the trainer history are saved.model.save.dir = NULL(default) disables saving the model.- ndims.out
Number of dimensions for
new.reductionoutput. Corresponds ton_latentargument in the original API of SCVINumber of nodes per hidden layer.
- n_layers
Number of hidden layers used for encoder and decoder NNs.
- dropout_rate
Dropout rate for neural networks.
- dispersion
One of the following:
gene: dispersion parameter of NB is constant per gene across cells (default)gene-batch: dispersion can differ between different batchesgene-label: dispersion can differ between different labelsgene-cell: dispersion can differ for every gene in every cell
- gene_likelihood
One of the following:
zinb: Zero-inflated negative binomial distribution (default)nb: Negative binomial distributionpoisson: Poisson distribution
- latent_distribution
One of the following:
normal: Normal distribution (default)ln: Logistic normal distribution (Normal(0, I) transformed by softmax)
- max_epochs
Number of passes through the dataset for semisupervised training.
- train_size
Size of training set in the range
[0.0, 1.0]- batch_size
Minibatch size to use during training.
- seed.use
An integer to generate reproducible outputs. Set
seed.use = NULLto disable- verbose
Print messages. Set to
FALSEto disable- verbose.scvi
Verbosity level of scVI. From quietest to talkiest: CRITICAL, ERROR, WARNING, INFO (default), DEBUG, NOTSET
- ...
For
scVIIntegration(), additional arguments to be passed toscvi.model.SCVI,SCVI.setup_anndataorSCVI.train(see Details section). ForscVIIntegration.fix(), all of the above
Value
A list containing:
a new DimReduc of name
new.reduction(key set toreduction.key) consisting of the latent space of the model withndims.outdimensions.
When called via IntegrateLayers, a Seurat object with
the new reduction and/or assay is returned
Details
This wrappers calls three python functions through reticulate. Find the scVI-specific arguments there:
model initiation: scvi.model.SCVI, which relies on scvi.module.VAE
anndata setup: SCVI.setup_anndata
training: SCVI.train
Note
This function requires the scvi-tools package to be installed (along with scipy)
References
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat Methods 15, 1053–1058 (2018). DOI
Examples
if (FALSE) { # \dontrun{
# Preprocessing
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
# After preprocessing, we integrate layers:
obj <- IntegrateLayers(object = obj, method = scVIIntegration,
features = Features(obj), conda_env = 'scvi-tools',
layers = 'counts', groups = obj[[]], groups.name = 'Method')
# To enable cell label-guided correction, save the model, add other
# 'nuisance' factors and increase number of threads used:
obj <- IntegrateLayers(object = obj, method = scVIIntegration,
features = Features(obj), conda_env = 'scvi-tools',
layers = 'counts', groups = obj[[]], groups.name = "Method",
labels.name = "CellType",
categorical_covariate_keys = list("Experiment"),
continuous_covariate_keys = list("percent.mito"),
ncores = 8, model.save.dir = '~/Documents/scVI.model')
} # }
