Unsupervised topological alignment for single-cell multi-omics integration

Abstract Motivation Single-cell multi-omics data provide a comprehensive molecular view of cells. However, single-cell multi-omics datasets consist of unpaired cells measured with distinct unmatched features across modalities, making data integration challenging. Results In this study, we present a novel algorithm, termed UnionCom, for the unsupervised topological alignment of single-cell multi-omics integration. UnionCom does not require any correspondence information, either among cells or among features. It first embeds the intrinsic low-dimensional structure of each single-cell dataset into a distance matrix of cells within the same dataset and then aligns the cells across single-cell multi-omics datasets by matching the distance matrices via a matrix optimization method. Finally, it projects the distinct unmatched features across single-cell datasets into a common embedding space for feature comparability of the aligned cells. To match the complex non-linear geometrical distorted low-dimensional structures across datasets, UnionCom proposes and adjusts a global scaling parameter on distance matrices for aligning similar topological structures. It does not require one-to-one correspondence among cells across datasets, and it can accommodate samples with dataset-specific cell types. UnionCom outperforms state-of-the-art methods on both simulated and real single-cell multi-omics datasets. UnionCom is robust to parameter choices, as well as subsampling of features. Availability and implementation UnionCom software is available at https://github.com/caokai1073/UnionCom. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  G. Sanguinetti,et al.  scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells , 2018, Nature Communications.

[2]  Hongbin Zha,et al.  Unsupervised Image Matching Based on Manifold Alignment , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Liang Ma,et al.  DensityPath: an algorithm to visualize and reconstruct cell state-transition path on density landscape for single-cell RNA sequencing data , 2018, Bioinform..

[4]  Shiguang Shan,et al.  Generalized Unsupervised Manifold Alignment , 2014, NIPS.

[5]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[6]  R. Satija,et al.  Integrative single-cell analysis , 2019, Nature Reviews Genetics.

[7]  Stephen R Quake,et al.  Single-cell multimodal profiling reveals cellular epigenetic heterogeneity , 2016, Nature Methods.

[8]  Daniel D. Lee,et al.  Semisupervised alignment of manifolds , 2005, AISTATS.

[9]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[10]  Smita Krishnaswamy,et al.  MAGAN: Aligning Biological Manifolds , 2018, ICML.

[11]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[12]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[13]  William Stafford Noble,et al.  Jointly Embedding Multiple Single-Cell Omics Measurements , 2019, bioRxiv.

[14]  Gerald Quon,et al.  scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data , 2018, Genome Biology.

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[17]  Sridhar Mahadevan,et al.  Manifold alignment using Procrustes analysis , 2008, ICML '08.

[18]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[19]  Sarah A. Teichmann,et al.  Computational methods for single-cell omics across modalities , 2020, Nature Methods.

[20]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.

[21]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[22]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[23]  Shiguang Shan,et al.  Image sets alignment for Video-Based Face Recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Chang Wang,et al.  Heterogeneous Domain Adaptation Using Manifold Alignment , 2011, IJCAI.

[25]  Alexander J. Hartemink,et al.  MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics , 2017, Genome Biology.