Cross-Linked Unified Embedding for cross-modality representation learning

Multi-modal learning is essential for understanding information in the real world. Jointly learning from multi-modal data enables global integration of both shared and modality-specific information, but current strategies often fail when observations from certain modalities are incomplete or missing for part of the subjects. To learn comprehensive representations based on such modality-incomplete data, we present a semi-supervised neural network model called CLUE (Cross-Linked Unified Embedding). Extending from multi-modal VAEs, CLUE introduces the use of cross-encoders to construct latent representations from modality-incomplete observations. Representation learning for modality-incomplete observations is common in genomics. For example, human cells are tightly regulated across multiple related but distinct modalities such as DNA, RNA, and protein, jointly defining a cell’s function. We benchmark CLUE on multi-modal data from single cell measurements, illustrating CLUE’s superior performance in all assessed categories of the NeurIPS 2021 Multimodal Single-cell Data Integration Competition. While we focus on analysis of single cell genomic datasets, we note that the proposed cross-linked embedding strategy could be readily applied to other cross-modality representation learning problems.

[1]  Ge Gao,et al.  Multi-omics single-cell data integration and regulatory inference with graph-linked embedding , 2022, Nature Biotechnology.

[2]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[3]  Paul J. Hoffman,et al.  Dictionary learning for integrative, multimodal and scalable single-cell analysis , 2022, bioRxiv.

[4]  Guohui Chuai,et al.  A deep generative model for multi-view profiling of single-cell RNA-seq and ATAC-seq data , 2022, Genome Biology.

[5]  E. Purdom,et al.  Cobolt: integrative analysis of multimodal single-cell sequencing data , 2021, Genome Biology.

[6]  William Stafford Noble,et al.  Semi-supervised single-cell cross-modality translation using Polarbear , 2021, bioRxiv.

[7]  Evan Z. Macosko,et al.  A transcriptomic and epigenomic cell atlas of the mouse primary motor cortex , 2021, Nature.

[8]  Michael I. Jordan,et al.  MultiVI: deep generative model for the integration of multi-modal data , 2021, bioRxiv.

[9]  Joshua D. Welch,et al.  Iterative single-cell multi-omic integration using online learning , 2021, Nature Biotechnology.

[10]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[11]  Howard Y. Chang,et al.  BABEL enables cross-modality translation between multiomic profiles at single-cell resolution , 2020, Proceedings of the National Academy of Sciences.

[12]  Ruize Wang,et al.  Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning , 2020, ACM Multimedia.

[13]  Aviv Regev,et al.  Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin , 2020, Cell.

[14]  William Stafford Noble,et al.  Unsupervised manifold alignment for single-cell multi-omics data , 2020, bioRxiv.

[15]  Yu Zhang,et al.  The changing mouse embryo transcriptome at whole tissue and single-cell resolution , 2020, Nature.

[16]  Yongdong Zhang,et al.  Multi-Modality Cross Attention Network for Image and Sentence Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  J. Marioni,et al.  MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data , 2020, Genome Biology.

[18]  Chao Gao,et al.  Jointly defining cell types from multiple single-cell datasets using LIGER , 2020, Nature Protocols.

[19]  Hongkui Zeng,et al.  A coupled autoencoder approach for multi-modal analysis of cell types , 2019, NeurIPS.

[20]  Kun Zhang,et al.  High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell , 2019, Nature Biotechnology.

[21]  Wenzhong Guo,et al.  Deep Multimodal Representation Learning: A Survey , 2019, IEEE Access.

[22]  O. Rando,et al.  Profiling of Pluripotency Factors in Single Cells and Early Embryos , 2019, Cell.

[23]  Trevor Darrell,et al.  Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[25]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[26]  Andrew C. Adey,et al.  Joint profiling of chromatin accessibility and gene expression in thousands of single cells , 2018, Science.

[27]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[28]  H. Swerdlow,et al.  Large-scale simultaneous measurement of epitopes and transcriptomes in single cells , 2017, Nature Methods.

[29]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Xiaoqing Feng,et al.  Multimodal video classification with stacked contractive autoencoders , 2016, Signal Process..

[31]  Andrew C. Adey,et al.  Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing , 2015, Science.

[32]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[33]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[34]  F. Tang,et al.  The DNA methylation landscape of human early embryos , 2014, Nature.

[35]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[36]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[37]  Fabian J Theis,et al.  A sandbox for prediction and integration of DNA, RNA, and protein data in single cells , 2021 .