A wrapper to run scVI
on multi-layered Seurat V5 object.
Requires a conda environment with scvi-tools
and necessary dependencies
Can be called via SeuratIntegrate::scVIIntegration()
or
scVIIntegration.fix()
Recommendations: use raw counts and all features
(features = Features(object), layers = "counts"
)
Usage
scVIIntegration(
object,
groups = NULL,
groups.name = NULL,
labels.name = NULL,
features = NULL,
layers = "counts",
scale.layer = "scale.data",
conda_env = NULL,
new.reduction = "integrated.scVI",
reduction.key = "scVIlatent_",
torch.intraop.threads = 4L,
torch.interop.threads = NULL,
model.save.dir = NULL,
ndims.out = 10,
n_hidden = 128L,
n_layers = 1L,
dropout_rate = 0.1,
dispersion = c("gene", "gene-batch", "gene-label", "gene-cell"),
gene_likelihood = c("zinb", "nb", "poisson"),
latent_distribution = c("normal", "ln"),
max_epochs = NULL,
train_size = 0.9,
batch_size = 128L,
seed.use = 42L,
verbose = TRUE,
verbose.scvi = c("INFO", "NOTSET", "DEBUG", "WARNING", "ERROR", "CRITICAL"),
...
)
scVIIntegration.fix(...)
Arguments
- object
A
Seurat
object (or anAssay5
object if not called byIntegrateLayers
)- groups
A named data frame with grouping information.
- groups.name
Column name from
groups
data frame that stores grouping information. Ifgroups.name = NULL
, the first column is used- labels.name
Column name from
groups
data frame that stores cell label information. Iflabels.name = NULL
, all cells are assigned the same label.- features
Vector of feature names to input to the integration method. When
features = NULL
(default), theVariableFeatures
are used. To pass all features, use the output ofFeatures()
- layers
Name of the layers to use in the integration. 'counts' is highly recommended
- scale.layer
Name of the scaled layer in
Assay
- conda_env
Path to conda environment to run scVI (should also contain the scipy python module). By default, uses the conda environment registered for scVI in the conda environment manager
- new.reduction
Name of the new integrated dimensional reduction
- reduction.key
Key for the new integrated dimensional reduction
- torch.intraop.threads
Number of intra-op threads available to torch when training on CPU instead of GPU. Set via
torch.set_num_threads()
.- torch.interop.threads
Number of intra-op threads available to torch when training on CPU instead of GPU. Set via
torch.set_num_interop_threads()
. Can only be changed once, on first call.- model.save.dir
Path to a directory to save the model to. Uses
SCVI.save()
. Does not save anndata. Note that neither the trainer optimizer state nor the trainer history are saved.model.save.dir = NULL
(default) disables saving the model.- ndims.out
Number of dimensions for
new.reduction
output. Corresponds ton_latent
argument in the original API of SCVINumber of nodes per hidden layer.
- n_layers
Number of hidden layers used for encoder and decoder NNs.
- dropout_rate
Dropout rate for neural networks.
- dispersion
One of the following:
gene
: dispersion parameter of NB is constant per gene across cells (default)gene-batch
: dispersion can differ between different batchesgene-label
: dispersion can differ between different labelsgene-cell
: dispersion can differ for every gene in every cell
- gene_likelihood
One of the following:
zinb
: Zero-inflated negative binomial distribution (default)nb
: Negative binomial distributionpoisson
: Poisson distribution
- latent_distribution
One of the following:
normal
: Normal distribution (default)ln
: Logistic normal distribution (Normal(0, I) transformed by softmax)
- max_epochs
Number of passes through the dataset for semisupervised training.
- train_size
Size of training set in the range
[0.0, 1.0]
- batch_size
Minibatch size to use during training.
- seed.use
An integer to generate reproducible outputs. Set
seed.use = NULL
to disable- verbose
Print messages. Set to
FALSE
to disable- verbose.scvi
Verbosity level of scVI. From quietest to talkiest: CRITICAL, ERROR, WARNING, INFO (default), DEBUG, NOTSET
- ...
For
scVIIntegration()
, additional arguments to be passed toscvi.model.SCVI
,SCVI.setup_anndata
orSCVI.train
(see Details section). ForscVIIntegration.fix()
, all of the above
Value
A list containing:
a new DimReduc of name
new.reduction
(key set toreduction.key
) consisting of the latent space of the model withndims.out
dimensions.
When called via IntegrateLayers
, a Seurat object with
the new reduction and/or assay is returned
Details
This wrappers calls three python functions through reticulate. Find the scVI-specific arguments there:
model initiation: scvi.model.SCVI, which relies on scvi.module.VAE
anndata setup: SCVI.setup_anndata
training: SCVI.train
Note
This function requires the scvi-tools package to be installed (along with scipy)
References
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat Methods 15, 1053–1058 (2018). DOI
Examples
if (FALSE) { # \dontrun{
# Preprocessing
obj <- SeuratData::LoadData("pbmcsca")
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
# After preprocessing, we integrate layers:
obj <- IntegrateLayers(object = obj, method = scVIIntegration,
features = Features(obj), conda_env = 'scvi-tools',
layers = 'counts', groups = obj[[]], groups.name = 'Method')
# To enable cell label-guided correction, save the model, add other
# 'nuisance' factors and increase number of threads used:
obj <- IntegrateLayers(object = obj, method = scVIIntegration,
features = Features(obj), conda_env = 'scvi-tools',
layers = 'counts', groups = obj[[]], groups.name = "Method",
labels.name = "CellType",
categorical_covariate_keys = list("Experiment"),
continuous_covariate_keys = list("percent.mito"),
ncores = 8, model.save.dir = '~/Documents/scVI.model')
} # }