Integrated Multi-Omics Analyses in Oncology: A Review of Machine Learning Methods and Tools

In recent years, high-throughput sequencing technologies provide unprecedented opportunity to depict cancer samples at multiple molecular levels. The integration and analysis of these multi-omics datasets is a crucial and critical step to gain actionable knowledge in a precision medicine framework. This paper explores recent data-driven methodologies that have been developed and applied to respond major challenges of stratified medicine in oncology, including patients' phenotyping, biomarker discovery, and drug repurposing. We systematically retrieved peer-reviewed journals published from 2014 to 2019, select and thoroughly describe the tools presenting the most promising innovations regarding the integration of heterogeneous data, the machine learning methodologies that successfully tackled the complexity of multi-omics data, and the frameworks to deliver actionable results for clinical practice. The review is organized according to the applied methods: Deep learning, Network-based methods, Clustering, Features Extraction, and Transformation, Factorization. We provide an overview of the tools available in each methodological group and underline the relationship among the different categories. Our analysis revealed how multi-omics datasets could be exploited to drive precision oncology, but also current limitations in the development of multi-omics data integration.

[1]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[2]  Minseon Kim,et al.  An Improved Method for Prediction of Cancer Prognosis by Network Learning , 2018, Genes.

[3]  N. Pochet,et al.  Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response , 2017, bioRxiv.

[4]  Aedín C. Culhane,et al.  A multivariate approach to the integration of multi-omics datasets , 2014, BMC Bioinformatics.

[5]  Zhao-yang Yang,et al.  Multiomics analysis on DNA methylation and the expression of both messenger RNA and microRNA in lung adenocarcinoma , 2018, Journal of cellular physiology.

[6]  Riccardo Bellazzi,et al.  A Network-Based Data Integration Approach to Support Drug Repurposing and Multi-Target Therapies in Triple Negative Breast Cancer , 2016, PloS one.

[7]  Dan-Yu Lin,et al.  Consistency and overfitting of multi-omics methods on experimental data , 2019, Briefings Bioinform..

[8]  Camilla R. Sharkey,et al.  Overcoming the loss of blue sensitivity through opsin duplication in the largest animal group, beetles , 2017, Scientific Reports.

[9]  Lana X. Garmire,et al.  Deep Learning based multi-omics integration robustly predicts survival in liver cancer , 2017, bioRxiv.

[10]  Jung-Hwan Yoon,et al.  Integrative analysis of genomic and epigenomic regulation of the transcriptome in liver cancer , 2017, Nature Communications.

[11]  Wei-Chung Cheng,et al.  DriverDBv3: a multi-omics database for cancer driver gene research , 2019, Nucleic Acids Res..

[12]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[13]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[14]  Jing Wang,et al.  LinkedOmics: analyzing multi-omics data within and across 32 cancer types , 2017, Nucleic Acids Res..

[15]  Florian Rohart,et al.  mixOmics: an R package for ‘omics feature selection and multiple data integration , 2017 .

[16]  Jin-Xing Liu,et al.  An Integrated Graph Regularized Non-Negative Matrix Factorization Model for Gene Co-Expression Network Analysis , 2019, IEEE Access.

[17]  Aidong Zhang,et al.  Affinity network fusion and semi-supervised learning for cancer patient clustering. , 2018, Methods.

[18]  N. Srinivasan,et al.  Use of designed sequences in protein structure recognition , 2018, Biology direct.

[19]  Lawrence Carin,et al.  Bayesian joint analysis of heterogeneous genomics data , 2014, Bioinform..

[20]  Kumardeep Chaudhary,et al.  Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer , 2017, Clinical Cancer Research.

[21]  Yan Zhao,et al.  Drug repositioning: a machine-learning approach through data integration , 2013, Journal of Cheminformatics.

[22]  Kim-Anh Lê Cao,et al.  mixOmics: An R package for ‘omics feature selection and multiple data integration , 2017, bioRxiv.

[23]  Michael Q. Zhang,et al.  Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification , 2015, BMC Genomics.

[24]  Marina Vannucci,et al.  A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data. , 2018, Biostatistics.

[25]  I. Attrée,et al.  CLIQ-BID: A method to quantify bacteria-induced damage to eukaryotic cells by automated live-imaging of bright nuclei , 2017, Scientific Reports.

[26]  Tieliu Shi,et al.  Deep Learning-Based Multi-Omics Data Integration Reveals Two Prognostic Subtypes in High-Risk Neuroblastoma , 2018, Front. Genet..

[27]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[28]  Natasa Przulj,et al.  Patient-Specific Data Fusion for Cancer Stratification and Personalised Treatment , 2016, PSB.

[29]  Eric F. Lock,et al.  R.JIVE for exploration of multi-source molecular data , 2016, Bioinform..

[30]  S. Knox From 'omics' to complex disease: a systems biology approach to gene-environment interactions in cancer , 2010, Cancer Cell International.

[31]  Jin Gu,et al.  Integrative clustering methods of multi-omics data for molecule-based cancer classifications , 2016, Quantitative Biology.

[32]  S. Acosta-Jurado,et al.  Transcriptomic Studies of the Effect of nod Gene-Inducing Molecules in Rhizobia: Different Weapons, One Purpose , 2017, Genes.

[33]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[34]  Robert Azencott,et al.  Robust Selection Algorithm (RSA) for Multi-Omic Biomarker Discovery; Integration with Functional Network Analysis to Identify miRNA Regulated Pathways in Multiple Cancers , 2015, PloS one.

[35]  Zheng Guo,et al.  An individualized prognostic signature and multi-omics distinction for early stage hepatocellular carcinoma patients with surgical resection , 2016, Oncotarget.

[36]  Xihong Lin,et al.  Multi-Omics Analysis Reveals a HIF Network and Hub Gene EPAS1 Associated with Lung Adenocarcinoma , 2018, EBioMedicine.

[37]  Anita Sathyanarayanan,et al.  A comparative study of multi-omics integration tools for cancer driver gene identification and tumour subtyping , 2019, Briefings Bioinform..

[38]  Pei Wang,et al.  Insights into Impact of DNA Copy Number Alteration and Methylation on the Proteogenomic Landscape of Human Ovarian Cancer via a Multi-omics Integrative Analysis* , 2018, Molecular & Cellular Proteomics.

[39]  Nathalie Villa-Vialaneix,et al.  Unsupervised multiple kernel learning for heterogeneous data integration , 2017, bioRxiv.

[40]  Kening Li,et al.  ICan: An Integrated Co-Alteration Network to Identify Ovarian Cancer-Related Genes , 2015, PloS one.

[41]  E. Diamandis,et al.  Multi-omics Biomarker Pipeline Reveals Elevated Levels of Protein-glutamine Gamma-glutamyltransferase 4 in Seminal Plasma of Prostate Cancer Patients* , 2019, Molecular & Cellular Proteomics.

[42]  Marinka Zitnik,et al.  Gene network inference by fusing data from diverse distributions , 2015, Bioinform..

[43]  Tim Sprosen,et al.  UK Biobank: from concept to reality. , 2005, Pharmacogenomics.

[44]  Cesare Furlanello,et al.  Multi-omics integration for neuroblastoma clinical endpoint prediction , 2018, Biology Direct.

[45]  Stephen T. C. Wong,et al.  Driver network as a biomarker: systematic integration and network modeling of multi-omics data to derive driver signaling pathways for drug combination prediction , 2019, Bioinform..

[46]  S. Feo,et al.  A multiomics analysis of S100 protein family in breast cancer , 2018, Oncotarget.

[47]  H. Ryu,et al.  Reconstruction of pathway modification induced by nicotinamide using multi-omic network analyses in triple negative breast cancer , 2017, Scientific Reports.

[48]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[49]  Stephan Beck,et al.  Making multi-omics data accessible to researchers , 2019, Scientific Data.

[50]  S. Drăghici,et al.  A novel approach for data integration and disease subtyping , 2017, Genome research.

[51]  A. Goldenberg,et al.  Intertumoral Heterogeneity within Medulloblastoma Subgroups. , 2017, Cancer cell.

[52]  Tom Michoel,et al.  Integrative Multi-omics Module Network Inference with Lemon-Tree , 2014, PLoS Comput. Biol..

[53]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[54]  Ron Shamir,et al.  NEMO: cancer subtyping by integration of partial multi-omic data , 2018, bioRxiv.

[55]  Marcelo A Soares,et al.  Distinct co-expression networks using multi-omic data reveal novel interventional targets in HPV-positive and negative head-and-neck squamous cell cancer , 2017, Scientific Reports.

[56]  Binhua Tang,et al.  Recent Advances of Deep Learning in Bioinformatics and Computational Biology , 2019, Front. Genet..

[57]  Hyungwon Choi,et al.  iOmicsPASS: network-based integration of multiomics data for predictive subnetwork discovery , 2019, npj Systems Biology and Applications.

[58]  Paula J. Griffin,et al.  Detection of multiple perturbations in multi‐omics biological networks , 2015, Biometrics.

[59]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[60]  Bonnie Berger,et al.  Compact Integration of Multi-Network Topology for Functional Analysis of Genes. , 2016, Cell systems.

[61]  R. Sharan,et al.  PREDICT: a method for inferring novel drug indications with application to personalized medicine , 2011, Molecular systems biology.

[62]  Qihua Tan,et al.  Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data , 2014, J. Integr. Bioinform..

[63]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[64]  Ting Chen,et al.  Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[65]  Ron Shamir,et al.  Constructing module maps for integrated analysis of heterogeneous biological networks , 2014, Nucleic acids research.

[66]  Kyungsook Han,et al.  Integration of Multi-Omics Data for Gene Regulatory Network Inference and Application to Breast Cancer , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[67]  Shimin Zhao,et al.  Sirtinol promotes PEPCK1 degradation and inhibits gluconeogenesis by inhibiting deacetylase SIRT2 , 2017, Scientific Reports.

[68]  Asoke K. Talukder,et al.  Multi-omics Multi-scale Big Data Analytics for Cancer Genomics , 2015, BDA.

[69]  Alioune Ngom,et al.  A review on machine learning principles for multi-view biological data integration , 2016, Briefings Bioinform..

[70]  Niko Beerenwinkel,et al.  Network-based integration of multi-omics data for prioritizing cancer genes , 2018, Bioinform..

[71]  Yu Jiang,et al.  A Selective Review of Multi-Level Omics Data Integration Using Variable Selection , 2019, High-throughput.

[72]  Maher Rizkalla,et al.  SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer , 2019, Front. Genet..

[73]  Kyung-Ah Sohn,et al.  Robust pathway-based multi-omics data integration using directed random walks for survival prediction in multiple cancer studies , 2019, Biology Direct.

[74]  Debashis Ghosh,et al.  Integrating Clinical and Multiple Omics Data for Prognostic Assessment across Human Cancers , 2017, Scientific Reports.

[75]  Gary D. Bader,et al.  netDx: Interpretable patient classification using integrated patient similarity networks , 2016, bioRxiv.

[76]  Lorenz Wernisch,et al.  Clusternomics: Integrative context-dependent clustering for heterogeneous datasets , 2017, bioRxiv.

[77]  H. Xia,et al.  Detecting the potential cancer association or metastasis by multi-omics data analysis. , 2016, Genetics and molecular research : GMR.

[78]  Su-In Lee,et al.  A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia , 2018, Nature Communications.

[79]  Emmanuel Barillot,et al.  Personalization of Logical Models With Multi-Omics Data Allows Clinical Stratification of Patients , 2019, Front. Physiol..

[80]  J. Marioni,et al.  Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets , 2018, Molecular systems biology.

[81]  J. Nielsen,et al.  Characterization of heterogeneous redox responses in hepatocellular carcinoma patients using network analysis , 2018, EBioMedicine.

[82]  Nico Pfeifer,et al.  Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery , 2015, Bioinform..

[83]  Hao Wu,et al.  VAMPnets for deep learning of molecular kinetics , 2017, Nature Communications.

[84]  Steven J. M. Jones,et al.  Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. , 2017, Cancer cell.

[85]  Feng Li,et al.  Pan-cancer analysis identifies telomerase-associated signatures and cancer subtypes , 2019, Molecular Cancer.