Liam tackles complex multimodal single-cell data integration challenges

Multi-omics characterization of single cells holds outstanding potential for profiling gene regulatory states of thousands of cells and their dynamics and relations. How to integrate multimodal data is an open problem, especially when aiming to combine data from multiple sources or conditions containing biological and technical variation. We introduce liam, a flexible model for the simultaneous horizontal and vertical integration of paired single-cell multimodal data. Liam learns a joint low-dimensional representation of two concurrently measured modalities, which proves beneficial when the information content or quality of the modalities differ. Its integration accounts for complex batch effects using a tuneable combination of conditional and adversarial training and can be optimized using replicate information while retaining selected biological variation. We demonstrate liam’s superior performance on multiple multimodal data sets, including Multiome and CITE-seq data. Detailed benchmarking experiments illustrate the complexities and challenges remaining for integration and the meaningful assessment of its success.

[1]  Joshua D. Welch,et al.  Multi-omic single-cell velocity models epigenome–transcriptome interactions and improves cell fate prediction , 2022, Nature Biotechnology.

[2]  Wing Hong Wong,et al.  Regulatory analysis of single cell multiome gene expression and chromatin accessibility data with scREG , 2022, Genome Biology.

[3]  Daniel B. Burkhardt,et al.  Multimodal single cell data integration challenge: results and lessons learned , 2022, bioRxiv.

[4]  Michael I. Jordan,et al.  A Python library for probabilistic analysis of single-cell omics data , 2022, Nature Biotechnology.

[5]  Mingbo Cheng,et al.  MOJITOO: a fast and universal method for integration of multimodal single-cell data , 2022, bioRxiv.

[6]  Guohui Chuai,et al.  A deep generative model for multi-view profiling of single-cell RNA-seq and ATAC-seq data , 2022, Genome Biology.

[7]  S. Lacadie,et al.  Single-cell-resolved dynamics of chromatin architecture delineate cell and regulatory states in zebrafish embryos , 2022, Cell genomics.

[8]  E. Purdom,et al.  Cobolt: integrative analysis of multimodal single-cell sequencing data , 2021, Genome Biology.

[9]  U. Ohler,et al.  Intricacies of single-cell multi-omics data integration. , 2021, Trends in genetics : TIG.

[10]  T. Shimamura,et al.  A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data , 2021, Cell reports methods.

[11]  Michael I. Jordan,et al.  MultiVI: deep generative model for the integration of multi-modal data , 2021, bioRxiv.

[12]  Bertrand Z. Yeung,et al.  Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells , 2021, Nature Biotechnology.

[13]  A. Akalin,et al.  Simultaneous dimensionality reduction and integration for single-cell ATAC-seq data using deep learning , 2021, Nature Machine Intelligence.

[14]  J. Marioni,et al.  Computational principles and challenges in single-cell data integration , 2021, Nature Biotechnology.

[15]  B. Berger,et al.  Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities , 2021, Genome Biology.

[16]  N. Yosef,et al.  PeakVI: A deep generative model for single-cell chromatin accessibility analysis , 2021, bioRxiv.

[17]  Xiaohui S. Xie,et al.  SAILER: scalable and accurate invariant representation learning for single-cell ATAC-seq processing and integration , 2021, bioRxiv.

[18]  Aaron M. Streets,et al.  Joint probabilistic modeling of single-cell multi-omic data with totalVI , 2021, Nature Methods.

[19]  Luonan Chen,et al.  Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data , 2020, Briefings Bioinform..

[20]  Raphael Gottardo,et al.  Integrated analysis of multimodal single-cell data , 2020, Cell.

[21]  Do Young Hyeon,et al.  Single-cell multiomics: technologies and data analysis methods , 2020, Experimental & Molecular Medicine.

[22]  Qin Ma,et al.  Integrative Methods and Practical Challenges for Single-Cell Multi-omics. , 2020, Trends in biotechnology.

[23]  Aviv Regev,et al.  Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin , 2020, Cell.

[24]  J. Marioni,et al.  MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data , 2020, Genome Biology.

[25]  Anghui Peng,et al.  Single‐Cell Multi‐Omics and Its Prospective Application in Cancer Biology , 2020, Proteomics.

[26]  Lior Pachter,et al.  Interpretable factor models of single-cell RNA-seq via variational autoencoders , 2019, bioRxiv.

[27]  Daeui Park,et al.  Differences in the molecular signatures of mucosal-associated invariant T cells and conventional T cells , 2019, Scientific Reports.

[28]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[29]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[30]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[31]  Fabian J Theis,et al.  A sandbox for prediction and integration of DNA, RNA, and protein data in single cells , 2021 .