Paired single-cell multi-omics data integration with Mowgli

The profiling of multiple molecular layers from the same set of cells has recently become possible. There is thus a growing need for multi-view learning methods able to jointly analyze these data. We here present Multi-Omics Wasserstein inteGrative anaLysIs (Mowgli), a novel method for the integration of paired multi-omics data with any type and number of omics. Of note, Mowgli combines integrative Nonnegative Matrix Factorization (NMF) and Optimal Transport (OT), enhancing at the same time the clustering performance and interpretability of integrative NMF. We apply Mowgli to multiple paired single-cell multi-omics data profiled with 10X Multiome, CITE-seq and TEA-seq. Our in depth benchmark demonstrates that Mowgli’s performance is competitive with the state-of-the-art in cell clustering and superior to the state-of-the-art once considering biological interpretability. Mowgli is implemented as a Python package seamlessly integrated within the scverse ecosystem and it is available at http://github.com/cantinilab/mowgli.

[1]  Wing Hong Wong,et al.  Regulatory analysis of single cell multiome gene expression and chromatin accessibility data with scREG , 2022, Genome Biology.

[2]  Daniel B. Burkhardt,et al.  Multimodal single cell data integration challenge: results and lessons learned , 2022, bioRxiv.

[3]  L. Garmire,et al.  Computational Methods for Single-cell Multi-omics Integration and Alignment , 2022, Genom. Proteom. Bioinform..

[4]  K. Shah,et al.  T cell receptor (TCR) signaling in health and disease , 2021, Signal Transduction and Targeted Therapy.

[5]  Rafael Riudavets Puig,et al.  JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles , 2021, Nucleic Acids Res..

[6]  C. Lareau,et al.  Single-cell chromatin state analysis with Signac , 2021, Nature Methods.

[7]  P. Vidalain,et al.  Sequential actions of EOMES and T-BET promote stepwise maturation of natural killer cells , 2021, Nature Communications.

[8]  Junhyong Kim,et al.  Multi-omics integration in the age of million single-cell data , 2021, Nature Reviews Nephrology.

[9]  Michael I. Jordan,et al.  MultiVI: deep generative model for the integration of multi-modal data , 2021, bioRxiv.

[10]  O. Stegle,et al.  MUON: multimodal omics analysis framework , 2021, bioRxiv.

[11]  B. Berger,et al.  Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities , 2021, Genome Biology.

[12]  Lucas T. Graybuck,et al.  Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq , 2021, eLife.

[13]  Stephen X. Zhang A unified framework for non-negative matrix and tensor factorisations with a smoothed Wasserstein loss , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[14]  G. Peyré,et al.  Optimal transport improves cell–cell similarity inference in single-cell omics data , 2021, bioRxiv.

[15]  Aaron M. Streets,et al.  Joint probabilistic modeling of single-cell multi-omic data with totalVI , 2021, Nature Methods.

[16]  Céline Hernandez,et al.  Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer , 2021, Nature Communications.

[17]  Luonan Chen,et al.  Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data , 2020, Briefings Bioinform..

[18]  Raphael Gottardo,et al.  Integrated analysis of multimodal single-cell data , 2020, Cell.

[19]  J. Cyster,et al.  Transcriptional regulation of memory B cell differentiation , 2020, Nature Reviews Immunology.

[20]  Fabian J Theis,et al.  LifeTime and improving European healthcare through cell-based interceptive medicine , 2020, Nature.

[21]  Do Young Hyeon,et al.  Single-cell multiomics: technologies and data analysis methods , 2020, Experimental & molecular medicine.

[22]  Lisa E. Wagar,et al.  An Integrated Multi-omic Single-Cell Atlas of Human B Cell Identity , 2020, Immunity.

[23]  M Dugas,et al.  Benchmarking atlas-level data integration in single-cell genomics , 2020, Nature Methods.

[24]  J. Marioni,et al.  MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data , 2020, Genome Biology.

[25]  Peng Qiu,et al.  Embracing the dropouts in single-cell RNA-seq analysis , 2020, Nature Communications.

[26]  Alexey M. Kozlov,et al.  Eleven grand challenges in single-cell data science , 2020, Genome Biology.

[27]  Q. Nie,et al.  scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles , 2020, Genome Biology.

[28]  Wei Chen,et al.  BREM-SC: a bayesian random effects mixture model for joint clustering single cell multi-omics data , 2020, bioRxiv.

[29]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30]  Jean Yee Hwa Yang,et al.  CiteFuse enables multi-modal analysis of CITE-seq data , 2019, bioRxiv.

[31]  Kun Zhang,et al.  High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell , 2019, Nature Biotechnology.

[32]  Lior Pachter,et al.  Interpretable factor models of single-cell RNA-seq via variational autoencoders , 2019, bioRxiv.

[33]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.

[34]  Ajit Singh,et al.  Machine Learning With Python , 2019 .

[35]  J. Vilo,et al.  g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update) , 2019, Nucleic Acids Res..

[36]  Neville E. Sanjana,et al.  Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells , 2019, Nature Methods.

[37]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[38]  Andrew C. Adey,et al.  Joint profiling of chromatin accessibility and gene expression in thousands of single cells , 2018, Science.

[39]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[40]  Alexander V. Favorov,et al.  Enter the Matrix: Factorization Uncovers Knowledge from Omics , 2018, Trends in genetics : TIG.

[41]  S. Potter,et al.  Single-cell RNA sequencing for the study of development, physiology and disease , 2018, Nature Reviews Nephrology.

[42]  Yunming Ye,et al.  Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity , 2018, Nature Communications.

[43]  G. Sanguinetti,et al.  scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells , 2018, Nature Communications.

[44]  Emily B. Fox,et al.  Interpretable VAEs for nonlinear group factor analysis , 2018, ICML 2018.

[45]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[46]  R. Satija,et al.  Single-cell RNA sequencing to explore immune cell heterogeneity , 2017, Nature Reviews Immunology.

[47]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[48]  H. Swerdlow,et al.  Large-scale simultaneous measurement of epitopes and transcriptomes in single cells , 2017, Nature Methods.

[49]  Brooke L. Fridley,et al.  Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm , 2017, PloS one.

[50]  Esko Ukkonen,et al.  Fast motif matching revisited: high‐order PWMs, SNPs and indels , 2016, Bioinform..

[51]  Xuelong Li,et al.  Non-Negative Matrix Factorization with Sinkhorn Distance , 2016, IJCAI.

[52]  Gabriel Peyré,et al.  Fast Dictionary Learning with a Smoothed Wasserstein Loss , 2016, AISTATS.

[53]  Daniel Marbach,et al.  Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases , 2016, Nature Methods.

[54]  C. Ponting,et al.  Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity , 2015, Nature Methods.

[55]  L. Lanier NKG2D Receptor and Its Ligands in Host Defense , 2015, Cancer Immunology Research.

[56]  Michael D. Robbins,et al.  Immune Cell Inhibition by SLAMF7 Is Mediated by a Mechanism Requiring Src Kinases, CD45, and SHIP-1 That Is Defective in Multiple Myeloma Cells , 2014, Molecular and Cellular Biology.

[57]  Edward Y. Chen,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[58]  B. Imhof,et al.  Homing of human B cells to lymphoid organs and B-cell lymphoma engraftment are controlled by cell adhesion molecule JAM-C. , 2013, Cancer research.

[59]  R. Sen,et al.  NF‐κB function in B lymphocytes , 2012, Immunological reviews.

[60]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[61]  J. Hagman,et al.  Early B cell factor: Regulator of B lineage specification and commitment. , 2008, Seminars in immunology.

[62]  Eric Vivier,et al.  Functions of natural killer cells , 2008, Nature Immunology.

[63]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[64]  E. Dejana,et al.  The role of junctional adhesion molecules in vascular inflammation , 2007, Nature Reviews Immunology.

[65]  M. Aurrand-Lions,et al.  Junctional adhesion molecule C (JAM-C) distinguishes CD27+ germinal center B lymphocytes from non-germinal center cells and constitutes a new diagnostic tool for B-cell malignancies , 2007, Leukemia.

[66]  E. Wherry,et al.  Effector and memory CD8+ T cell fate coupled by T-bet and eomesodermin , 2005, Nature Immunology.

[67]  M. Colonna,et al.  The tumor suppressor TSLC1/NECL-2 triggers NK-cell and CD8+ T-cell responses through the cell-surface receptor CRTAM. , 2005, Blood.

[68]  K. Früh,et al.  Downregulation of Major Histocompatibility Complex Class I by Human Ubiquitin Ligases Related to Viral Immune Evasion Proteins , 2004, Journal of Virology.

[69]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[70]  M. Vitale,et al.  Role of CREB transcription factor in c‐fos activation in natural killer cells , 2002, European journal of immunology.

[71]  R. Kucherlapati,et al.  Human KLRF1, a novel member of the killer cell lectin‐like receptor gene family: molecular characterization, genomic structure, physical mapping to the NK gene complex and expression analysis , 2000, European journal of immunology.

[72]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[73]  F. Bertucci,et al.  Engagement of natural cytotoxicity programs regulates AP-1 expression in the NKL human NK cell line. , 1999, Journal of immunology.

[74]  C. Froelich,et al.  Human granzyme B is essential for DNA fragmentation of susceptible target cells , 1994, European journal of immunology.

[75]  J. York,et al.  Phenotypic comparison of the three populations of human lymphocytes defined by CD45RO and CD45RA expression. , 1992, Cellular immunology.

[76]  E. Rieber,et al.  IgE‐dependent antigen focusing by human B lymphocytes is mediated by the low‐affinity receptor for IgE , 1990, European journal of immunology.

[77]  Fabian J Theis,et al.  A sandbox for prediction and integration of DNA, RNA, and protein data in single cells , 2021 .

[78]  Arthur Cayley,et al.  The Collected Mathematical Papers: On Monge's “Mémoire sur la théorie des déblais et des remblais” , 2009 .

[79]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[80]  F. Seiler,et al.  [Structure and function of immunoglobulins]. , 1982, Beitrage zu Infusionstherapie und klinische Ernahrung.