MaCroDNA: Accurate integration of single-cell DNA and RNA data for a deeper understanding of tumor heterogeneity

Cancers develop and progress as mutations accumulate, and with the advent of single-cell DNA and RNA sequencing, researchers can observe these mutations, their transcriptomic effects, and predict proteomic changes with remarkable temporal and spatial precision. However, to connect genomic mutations with their transcriptomic and proteomic consequences, cells with either only DNA data or only RNA data must be mapped to a common domain. For this purpose, we present MaCroDNA, a novel method which uses maximum weighted bipartite matching of per-gene read counts from single-cell DNA and RNA-seq data. Using ground truth information from colorectal cancer data, we demonstrate the overwhelming advantage of MaCroDNA over existing methods in accuracy and speed. Exemplifying the utility of single-cell data integration in cancer research, we propose, based on results derived using MaCroDNA, that genomic mutations of large effect size increasingly contribute to differential expression between cells as Barrett’s esophagus progresses to esophageal cancer.

[1]  R. Başar,et al.  Bi-order multimodal integration of single-cell data , 2022, Genome Biology.

[2]  N. Shaheen,et al.  Diagnosis and Management of Barrett's Esophagus: An Updated ACG Guideline , 2022, The American journal of gastroenterology.

[3]  A. van Oudenaarden,et al.  Molecular characterization of Barrett’s esophagus at single-cell resolution , 2021, Proceedings of the National Academy of Sciences.

[4]  S. Shah,et al.  Harnessing multimodal data integration to advance precision oncology , 2021, Nature Reviews Cancer.

[5]  Wei Liu,et al.  A Survey on Canonical Correlation Analysis , 2019, IEEE Transactions on Knowledge and Data Engineering.

[6]  Yutaka Suzuki,et al.  Single-cell sequencing techniques from individual to multiomics analyses , 2020, Experimental & Molecular Medicine.

[7]  A. S. Booeshaghi,et al.  Normalization of single-cell RNA-seq counts by log(x + 1) or log(1 + x) , 2020, bioRxiv.

[8]  Bora Lim,et al.  Advancing Cancer Research and Medicine with Single-Cell Genomics. , 2020, Cancer cell.

[9]  Zhiyong Guo,et al.  Single-cell transcriptome profiling of an adult human cell atlas of 15 major organs , 2020, bioRxiv.

[10]  Davis J. McCarthy,et al.  Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes , 2020, Nature Methods.

[11]  Alexey M. Kozlov,et al.  Eleven grand challenges in single-cell data science , 2020, Genome Biology.

[12]  L. C. Xia,et al.  Joint Inference of Clonal Structure using Single-cell Genome and Transcriptome Sequencing Data , 2020, bioRxiv.

[13]  Yiguang Hong,et al.  Unsupervised topological alignment for single-cell multi-omics integration , 2020, bioRxiv.

[14]  Anastasiya Belyaeva,et al.  Multi-domain translation between single-cell imaging and sequencing data using autoencoders , 2019, Nature Communications.

[15]  F. W. Townes,et al.  Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model , 2019, Genome Biology.

[16]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.

[17]  Kieran R. Campbell,et al.  clonealign: statistical integration of independent single-cell RNA and DNA sequencing data from human cancers , 2019, Genome Biology.

[18]  R. Satija,et al.  Integrative single-cell analysis , 2019, Nature Reviews Genetics.

[19]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[20]  Lu Wen,et al.  Single-cell multiomics sequencing and analyses of human colorectal cancer , 2018, Science.

[21]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[22]  Florian Erhard,et al.  Estimating pseudocounts and fold changes for digital expression measurements , 2018, Bioinform..

[23]  Smita Krishnaswamy,et al.  MAGAN: Aligning Biological Manifolds , 2018, ICML.

[24]  Je-Gun Joung,et al.  SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells , 2018, Genome research.

[25]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[26]  Joshua D. Welch,et al.  MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics , 2017, Genome Biology.

[27]  Brooke L. Fridley,et al.  Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm , 2017, PloS one.

[28]  Samuel Aparicio,et al.  Scalable whole-genome single-cell library preparation without preamplification , 2017, Nature Methods.

[29]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[30]  A. Bass,et al.  Oesophageal adenocarcinoma and gastric cancer: should we mind the gap? , 2016, Nature Reviews Cancer.

[31]  F. Rimet,et al.  phylosignal: an R package to measure, test, and explore the phylogenetic signal , 2016, Ecology and evolution.

[32]  Lu Wen,et al.  Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas , 2016, Cell Research.

[33]  P. Wittkopp,et al.  Contrasting Frequencies and Effects of cis- and trans-Regulatory Mutations Affecting Gene Expression. , 2016, Molecular biology and evolution.

[34]  N. Navin,et al.  Advances and applications of single-cell sequencing technologies. , 2015, Molecular cell.

[35]  P. Malfertheiner,et al.  HER2 status in gastroesophageal cancer: a tissue microarray study of 1040 cases. , 2015, Human pathology.

[36]  C. Ponting,et al.  G&T-seq: parallel sequencing of single-cell genomes and transcriptomes , 2015, Nature Methods.

[37]  M. Cugmas,et al.  On comparing partitions , 2015 .

[38]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[39]  N. Navin Cancer genomics: one cell at a time , 2014, Genome Biology.

[40]  G. Brent,et al.  Thyroid hormone regulation of metabolism. , 2014, Physiological reviews.

[41]  Harald Binder,et al.  Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures , 2014, PloS one.

[42]  Tsung-Teh Wu,et al.  Adverse prognostic impact of intratumor heterogeneous HER2 gene amplification in patients with esophageal adenocarcinoma. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[43]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[44]  T. Rangel,et al.  EXPLORING PATTERNS OF INTERSPECIFIC VARIATION IN QUANTITATIVE TRAITS USING SEQUENTIAL PHYLOGENETIC EIGENVECTOR REGRESSIONS , 2012, Evolution; international journal of organic evolution.

[45]  C. Cole,et al.  COSMIC: the catalogue of somatic mutations in cancer , 2011, Genome Biology.

[46]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[47]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[48]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[49]  M. Moasser The oncogene HER2: its signaling and transforming functions and its role in human cancer pathogenesis , 2007, Oncogene.

[50]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[51]  T. Garland,et al.  TESTING FOR PHYLOGENETIC SIGNAL IN COMPARATIVE DATA: BEHAVIORAL TRAITS ARE MORE LABILE , 2003, Evolution; international journal of organic evolution.

[52]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[53]  P. Dítě,et al.  [Barrett's esophagus]. , 2000, Bratislavske lekarske listy.

[54]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[55]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[56]  Brian D. Dynlacht,et al.  Regulation of transcription by proteins that control the cell cycle , 1997, Nature.

[57]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[58]  Theodore Garland,et al.  Phylogenetic Analysis of Covariance by Computer Simulation , 1993 .

[59]  Theodore Garland,et al.  Does metatarsal/femur ratio predict maximal running speed in cursorial mammals? , 1993 .

[60]  N. Tomizawa,et al.  On some techniques useful for solution of transportation network problems , 1971, Networks.

[61]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[62]  H. W. Kuhn B R Y N Mawr College Variants of the Hungarian Method for Assignment Problems' , 1955 .