On fusion methods for knowledge discovery from multi-omics datasets

Recent years have witnessed the tendency of measuring a biological sample on multiple omics scales for a comprehensive understanding of how biological activities on varying levels are perturbed by genetic variants, environments, and their interactions. This new trend raises substantial challenges to data integration and fusion, of which the latter is a specific type of integration that applies a uniform method in a scalable manner, to solve biological problems which the multi-omics measurements target. Fusion-based analysis has advanced rapidly in the past decade, thanks to application drivers and theoretical breakthroughs in mathematics, statistics, and computer science. We will briefly address these methods from methodological and mathematical perspectives and categorize them into three types of approaches: data fusion (a narrowed definition as compared to the general data fusion concept), model fusion, and mixed fusion. We will demonstrate at least one typical example in each specific category to exemplify the characteristics, principles, and applications of the methods in general, as well as discuss the gaps and potential issues for future studies.

[1]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[2]  Mehmet Koyutürk,et al.  Network biology methods integrating biological data for translational science , 2012, Briefings Bioinform..

[3]  M. McCarthy,et al.  Tensor decomposition for multi-tissue gene expression experiments , 2016, Nature Genetics.

[4]  L. E. Wangen,et al.  A multiblock partial least squares algorithm for investigating complex chemical systems , 1989 .

[5]  Yuanhua Liu,et al.  Multilevel omic data integration in cancer cell lines: advanced annotation and emergent properties , 2013, BMC Systems Biology.

[6]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[7]  Olga G. Troyanskaya,et al.  Detailing regulatory networks through large scale data integration , 2009, Bioinform..

[8]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[9]  Yongcui Wang,et al.  Drug Repositioning by Kernel-Based Integration of Molecular Structure, Molecular Activity, and Phenotype Data , 2013, PloS one.

[10]  S. de Jong,et al.  A framework for sequential multiblock component methods , 2003 .

[11]  Lorenzo L. Pesce,et al.  Integrative genomics analyses unveil downstream biological effectors of disease-specific polymorphisms buried in intergenic regions , 2016, npj Genomic Medicine.

[12]  Ayellet V. Segrè,et al.  Colocalization of GWAS and eQTL Signals Detects Target Genes , 2016, bioRxiv.

[13]  Shannon L. Risacher,et al.  Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data , 2017, Briefings Bioinform..

[14]  S. Lê,et al.  BMC Genomics BioMed Central Methodology article Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach , 2008 .

[15]  Yong Wang,et al.  Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations , 2018, Proceedings of the National Academy of Sciences.

[16]  David B. Dunson,et al.  Bayesian consensus clustering , 2013, Bioinform..

[17]  H. Abdi,et al.  Multiple factor analysis: principal component analysis for multitable and multiblock data sets , 2013 .

[18]  C. Sander,et al.  Pattern discovery and cancer gene identification in integrated cancer genomic data , 2013, Proceedings of the National Academy of Sciences.

[19]  A. Lusis,et al.  Considerations for the design of omics studies , 2017 .

[20]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[21]  Luciano Milanesi,et al.  Methods for the integration of multi-omics data: mathematical aspects , 2016, BMC Bioinformatics.

[22]  Juan Liu,et al.  Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data , 2017, Bioinform..

[23]  Yong Wang,et al.  DC3 is a method for deconvolution and coupled clustering from bulk and single-cell genomics data , 2019, Nature Communications.

[24]  Yves A. Lussier,et al.  Breakthroughs in genomics data integration for predicting clinical outcome , 2012, J. Biomed. Informatics.

[25]  Corrado Priami,et al.  Multi-omics integration - a comparison of unsupervised clustering methodologies , 2019, Briefings Bioinform..

[26]  D. Brat,et al.  Predicting cancer outcomes from histology and genomics using convolutional networks , 2017, Proceedings of the National Academy of Sciences.

[27]  Aedín C. Culhane,et al.  Dimension reduction techniques for the integrative analysis of multi-omics data , 2016, Briefings Bioinform..

[28]  Iven Van Mechelen,et al.  UvA-DARE ( Digital Academic Repository ) A structured overview of simultaneous component based data integration , 2009 .

[29]  Scott M. Williams,et al.  Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS , 2018, Human Genetics.

[30]  Florian Markowetz,et al.  Patient-Specific Data Fusion Defines Prognostic Cancer Subtypes , 2011, PLoS Comput. Biol..

[31]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[32]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[33]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[34]  Joerg M. Buescher,et al.  Integration of omics: more than the sum of its parts , 2016, Cancer & Metabolism.

[35]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[36]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[37]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[38]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[39]  John P. Cunningham,et al.  Tensor Analysis Reveals Distinct Population Structure that Parallels the Different Computational Roles of Areas M1 and V1 , 2016, PLoS Comput. Biol..

[40]  Jeffrey S. Morris,et al.  iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data , 2012, Bioinform..

[41]  Ignacio González,et al.  Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework , 2016, BMC Bioinformatics.

[42]  Nikos D. Sidiropoulos,et al.  Tensors for Data Mining and Data Fusion , 2016, ACM Trans. Intell. Syst. Technol..

[43]  Aedín C. Culhane,et al.  A multivariate approach to the integration of multi-omics datasets , 2014, BMC Bioinformatics.

[44]  Sven Bergmann,et al.  A modular approach for integrative analysis of large-scale gene-expression and drug-response data , 2008, Nature Biotechnology.

[45]  E. Lin,et al.  Machine learning and systems genomics approaches for multi-omics data , 2017, Biomarker Research.

[46]  Henk A. L. Kiers,et al.  Hierarchical relations between methods for simultaneous component analysis and a technique for rotation to a simple simultaneous structure , 1994 .

[47]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[48]  Holger Fröhlich,et al.  Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics , 2013, PloS one.

[49]  R. C. Durfee,et al.  MULTIPLE FACTOR ANALYSIS. , 1967 .

[50]  Trupti Joshi,et al.  Inferring gene regulatory networks from multiple microarray datasets , 2006, Bioinform..

[51]  Tao Zeng,et al.  Integrative Analysis of Omics Big Data. , 2018, Methods in molecular biology.