Run Harmony on Seurat's Assay5 object through IntegrateLayers — HarmonyIntegration • SeuratIntegrate

A wrapper to run Harmony on multi-layered Seurat V5 object

Can be called via SeuratIntegrate::HarmonyIntegration() or HarmonyIntegration.fix()

Usage

HarmonyIntegration(
  object,
  orig,
  groups = NULL,
  groups.name = NULL,
  layers = NULL,
  scale.layer = "scale.data",
  features = NULL,
  new.reduction = "harmony",
  dims = NULL,
  key = "harmony_",
  seed.use = 42L,
  theta = NULL,
  sigma = 0.1,
  lambda = NULL,
  nclust = NULL,
  ncores = 1L,
  max_iter = 10,
  early_stop = TRUE,
  plot_convergence = FALSE,
  .options = harmony_options(),
  verbose = TRUE,
  ...
)

HarmonyIntegration.fix(...)

Arguments

object: A Seurat object (or an Assay5 object if not called by IntegrateLayers)
orig: DimReduc object. Not to be set directly when called with IntegrateLayers, use orig.reduction argument instead
groups: A named data frame with grouping information. Preferably one-column when groups.name = NULL
groups.name: Column name from groups data frame that stores grouping information. If groups.name = NULL, the first column is used
layers: Ignored unless groups = NULL, then used to create grouping variable to correct batch-effect.
scale.layer: Ignored
features: Ignored
new.reduction: Name of the new integrated dimensional reduction
dims: Dimensions of dimensional reduction to use for integration. All used by default
key: Prefix for the dimension names computed by harmony.
seed.use: An integer to generate reproducible outputs. Set seed.use = NULL to disable
theta: Diversity clustering penalty parameter. Specify for each variable in vars_use Default theta=2. theta=0 does not encourage any diversity. Larger values of theta result in more diverse clusters.
sigma: Width of soft kmeans clusters. Default sigma=0.1. Sigma scales the distance from a cell to cluster centroids. Larger values of sigma result in cells assigned to more clusters. Smaller values of sigma make soft kmeans cluster approach hard clustering.
lambda: Ridge regression penalty. Default lambda=1. Bigger values protect against over correction. If several covariates are specified, then lambda can also be a vector which needs to be equal length with the number of variables to be corrected. In this scenario, each covariate level group will be assigned the scalars specified by the user. If set to NULL, harmony will start lambda estimation mode to determine lambdas automatically and try to minimize overcorrection (Use with caution still in beta testing).
nclust: Number of clusters in model. nclust=1 equivalent to simple linear regression.
ncores: Number of processors to be used for math operations when optimized BLAS is available. If BLAS is not supporting multithreaded then this option has no effect. By default, ncore=1 which runs as a single-threaded process. Although Harmony supports multiple cores, it is not optimized for multithreading. Increase this number for large datasets iff single-core performance is not adequate.
max_iter: Maximum number of rounds to run Harmony. One round of Harmony involves one clustering and one correction step.
early_stop: Enable early stopping for harmony. The harmonization process will stop when the change of objective function between corrections drops below 1e-4
plot_convergence: Whether to print the convergence plot of the clustering objective function. TRUE to plot, FALSE to suppress. This can be useful for debugging.
.options: Setting advanced parameters of RunHarmony. This must be the result from a call to `harmony_options`. See ?`harmony_options` for parameters not listed above and more details.
verbose: Print messages. Set to FALSE to disable
...: Ignored for HarmonyIntegration(), or all of the above for HarmonyIntegration.fix()

Value

The function itself returns a list containing:

a new DimReduc of name reduction.name (key set to reduction.key) with corrected cell embeddings matrix of length(dims) columns.

When called via IntegrateLayers, a Seurat object with the new reduction is returned

Note

This function requires the harmony package to be installed

References

Korsunsky, I., Millard, N., Fan, J., Slowikowski, K., Zhang, F., Wei, K., Baglaenko, Y., Brenner, M., Loh, P. & Raychaudhuri, S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296 (2019). DOI

Examples

if (FALSE) { # \dontrun{
# Preprocessing
obj <- UpdateSeuratObject(SeuratData::LoadData("pbmcsca"))
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)

# After preprocessing, we integrate layers based on the "Method" variable:
obj <- IntegrateLayers(object = obj, method = SeuratIntegrate::HarmonyIntegration,
                       verbose = TRUE)

# We can also change parameters such as the batch-effect variable.
# Here we change the groups variable, the number of dimension used from the original
# PCA and minor options from `harmony_options()`:
harmonyOptions <- harmony::harmony_options()
harmonyOptions$max.iter.cluster <- 10   #  20 by default
harmonyOptions$block.size <- .1         # .05 by default
obj <- IntegrateLayers(object = obj, method = SeuratIntegrate::HarmonyIntegration,
                       dims = 1:30, plot_convergence = TRUE,
                       groups = obj[[]]$Experiment,
                       new.reduction = "harmony_custom",
                       .options = harmonyOptions, verbose = TRUE)
} # }