Integrative approaches to reconstruct regulatory networks from multi-omics data: A review of state-of-the-art methods

Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data. These data have different types (discrete, real, string, etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data (copy number variation, single nucleotide polymorphisms), annotations, interactions, and association data are some of the commonly used biological datasets to study various cellular mechanisms of living organisms. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about the regulatory mechanisms of genes and their products. Therefore, integrating these data and inferring regulatory interactions from them offer a system level of biological insight in predicting gene functions and their phenotypic outcomes. To study genome functionality through regulatory networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here integration methods that reconstruct regulatory networks using state-of-the-art techniques to handle multi-omics (i.e., genomic, transcriptomic, proteomic) and other biological datasets.

[1]  Xingli Guo,et al.  A Computational Method Based on the Integration of Heterogeneous Networks for Predicting Disease-Gene Associations , 2011, PloS one.

[2]  Muriel Médard,et al.  Network deconvolution as a general method to distinguish direct dependencies in networks , 2013, Nature Biotechnology.

[3]  B. Rannala,et al.  The Bayesian revolution in genetics , 2004, Nature Reviews Genetics.

[4]  Xihong Lin,et al.  Sparse Principal Component Analysis for Identifying Ancestry‐Informative Markers in Genome‐Wide Association Studies , 2012, Genetic epidemiology.

[5]  Simon Rogers,et al.  A Bayesian regression approach to the inference of regulatory networks from gene expression data , 2005, Bioinform..

[6]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[7]  A. Bulpitt,et al.  Insights into protein-protein interfaces using a Bayesian network prediction method. , 2006, Journal of molecular biology.

[8]  Khalid Raza,et al.  Raw Sequence to Target Gene Prediction: An Integrated Inference Pipeline for ChIP-Seq and RNA-Seq Datasets , 2018, Advances in Intelligent Systems and Computing.

[9]  Juan Liu,et al.  A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules , 2011, Bioinform..

[10]  Michael Banf,et al.  Enhancing gene regulatory network inference through data integration with markov random fields , 2017, Scientific Reports.

[11]  Philippe Besse,et al.  Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems , 2011, BMC Bioinformatics.

[12]  Dong-Yeon Cho,et al.  Dissecting cancer heterogeneity with a probabilistic genotype-phenotype model , 2013, RECOMB.

[13]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[14]  Rafael C. Jimenez,et al.  Data integration in biological research: an overview , 2015, Journal of Biological Research-Thessaloniki.

[15]  Douglas L. Brutlag,et al.  Remote homology detection: a motif based approach , 2003, ISMB.

[16]  E. Dougherty,et al.  Multivariate measurement of gene expression relationships. , 2000, Genomics.

[17]  M. Fornasier,et al.  Iterative thresholding algorithms , 2008 .

[18]  Donald C. Wunsch,et al.  Modeling of gene regulatory networks with hybrid differential evolution and particle swarm optimization , 2007, Neural Networks.

[19]  Sang C. Suh,et al.  Integration of multi-omics data for integrative gene regulatory network inference , 2017, Int. J. Data Min. Bioinform..

[20]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[21]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[22]  Bonnie Berger,et al.  Compact Integration of Multi-Network Topology for Functional Analysis of Genes. , 2016, Cell systems.

[23]  Georgios B. Giannakis,et al.  Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations , 2013, PLoS Comput. Biol..

[24]  E. J. van den Oord,et al.  Convergence of evidence from a methylome-wide CpG-SNP association study and GWAS of major depressive disorder , 2018, Translational Psychiatry.

[25]  Marinka Zitnik,et al.  Matrix Factorization-Based Data Fusion for Gene Function Prediction in Baker's Yeast and Slime Mold , 2013, Pacific Symposium on Biocomputing.

[26]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.

[28]  Hsiang-Yuan Yeh,et al.  Inferring drug-disease associations from integration of chemical, genomic and phenotype data using network propagation , 2013, BMC Medical Genomics.

[29]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[30]  Stephen A. Vavasis,et al.  On the Complexity of Nonnegative Matrix Factorization , 2007, SIAM J. Optim..

[31]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[32]  M. Gönen,et al.  Machine learning integration for predicting the effect of single amino acid substitutions on protein stability , 2009, BMC Structural Biology.

[33]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[34]  J. Vohradský Neural network model of gene expression , 2001, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[35]  P. Bushel,et al.  Principal component analysis-based filtering improves detection for Affymetrix gene expression arrays , 2011, Nucleic acids research.

[36]  Tapesh Santra,et al.  A Bayesian Framework That Integrates Heterogeneous Data for Inferring Gene Regulatory Networks , 2014, Front. Bioeng. Biotechnol..

[37]  Andrea Rau,et al.  A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data , 2013, PloS one.

[38]  Zoubin Ghahramani,et al.  Bayesian correlated clustering to integrate multiple datasets , 2012, Bioinform..

[39]  Reza Monsefi,et al.  Genetic Regulatory Network Inference using Recurrent Neural Networks trained by a Multi Agent System , 2011 .

[40]  Hitoshi Iba,et al.  Reconstruction of Gene Regulatory Networks from Gene Expression Data Using Decoupled Recurrent Neural Network Model , 2013 .

[41]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[42]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[43]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[44]  Chris J. Myers,et al.  Meeting report from the fourth meeting of the Computational Modeling in Biology Network (COMBINE) , 2011, Standards in Genomic Sciences.

[45]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[46]  Nilanjan Dey,et al.  A Survey of Data Mining and Deep Learning in Bioinformatics , 2018, Journal of Medical Systems.

[47]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[49]  Michael Hecker,et al.  Gene regulatory network inference: Data integration in dynamic models - A review , 2009, Biosyst..

[50]  Marcel J. T. Reinders,et al.  Integration of Clinical and Gene Expression Data Has a Synergetic Effect on Predicting Breast Cancer Outcome , 2012, PloS one.

[51]  Yoshihiro Yamanishi,et al.  Protein network inference from multiple genomic data: a supervised approach , 2004, ISMB/ECCB.

[52]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[53]  Jeanne M O Eloundou-Mbebi,et al.  Gene regulatory network inference using fused LASSO on multiple data sets , 2016, Scientific Reports.

[54]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[55]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[56]  Takeaki Kariya,et al.  Generalized Least Squares Estimators , 2004 .

[57]  N. O. Manning,et al.  The protein data bank , 1999, Genetica.

[58]  Tianhai Tian,et al.  Stochastic neural network models for gene regulatory networks , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[59]  Jean-Philippe Vert,et al.  A tree kernel to analyse phylogenetic profiles , 2002, ISMB.

[60]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[61]  B S Weir,et al.  Truncated product method for combining P‐values , 2002, Genetic epidemiology.

[62]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[63]  Degui Zhi,et al.  SNPs located at CpG sites modulate genome-epigenome interaction , 2013, Epigenetics.

[64]  Florian Markowetz,et al.  Patient-Specific Data Fusion Defines Prognostic Cancer Subtypes , 2011, PLoS Comput. Biol..

[65]  Min Chen,et al.  Joint conditional Gaussian graphical models with multiple sources of genomic data , 2013, Front. Genet..

[66]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[67]  Gary D. Stormo,et al.  Modeling Regulatory Networks with Weight Matrices , 1998, Pacific Symposium on Biocomputing.

[68]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[69]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[70]  Cengizhan Ozturk,et al.  Bayesian network prior: network analysis of biological data using external knowledge , 2013, Bioinform..

[71]  Tapesh Santra,et al.  Integrating Bayesian variable selection with Modular Response Analysis to infer biochemical network topology , 2013, BMC Systems Biology.

[72]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[73]  Guy Perrière,et al.  Cross-platform comparison and visualisation of gene expression data using co-inertia analysis , 2003, BMC Bioinformatics.

[74]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[75]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[76]  Aedín C. Culhane,et al.  A multivariate approach to the integration of multi-omics datasets , 2014, BMC Bioinformatics.

[77]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[78]  E. Qannari,et al.  Deflation strategies for multi-block principal component analysis revisited , 2013 .

[79]  Bonnie Berger,et al.  Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks , 2015, RECOMB.

[80]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[81]  Michael I. Jordan,et al.  A Framework for Genomic Data Fusion and its Application to Membrane Protein Prediction , 2004 .

[82]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[83]  L. Tran,et al.  Integrated Systems Approach Identifies Genetic Nodes and Networks in Late-Onset Alzheimer’s Disease , 2013, Cell.

[84]  Ana Conesa,et al.  A multiway approach to data integration in systems biology based on Tucker3 and N-PLS , 2010 .

[85]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[86]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[87]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[88]  Jan Baumbach,et al.  KeyPathwayMiner: Detecting Case-Specific Biological Pathways Using Expression Data , 2011, Internet Math..

[89]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[90]  C. Greenwood,et al.  Data Integration in Genetics and Genomics: Methods and Challenges , 2009, Human genomics and proteomics : HGP.

[91]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[92]  Stan Z. Li,et al.  Markov Random Field Models in Computer Vision , 1994, ECCV.

[93]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[94]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[95]  H. A. Schulke Matrix factorization , 1955, IRE Transactions on Circuit Theory.

[96]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[97]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[98]  Suteaki Shioya,et al.  Clustering gene expression pattern and extracting relationship in gene network based on artificial neural networks. , 2003, Journal of bioscience and bioengineering.

[99]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[100]  Nataša Pržulj,et al.  Methods for biological data integration: perspectives and challenges , 2015, Journal of The Royal Society Interface.

[101]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[102]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[103]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[104]  Mukesh Jain,et al.  NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data , 2012, PloS one.

[105]  Xiaobo Zhou,et al.  A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks , 2004, Bioinform..

[106]  William Stafford Noble,et al.  Learning to predict protein-protein interactions from protein sequences , 2003, Bioinform..

[107]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[108]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[109]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[110]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[111]  Bernhard Schölkopf,et al.  Kernel methods in medical imaging , 2015 .

[112]  Nigel W. Hardy,et al.  The metabolomics standards initiative (MSI) , 2007, Metabolomics.

[113]  Marinka Zitnik,et al.  Gene network inference by fusing data from diverse distributions , 2015, Bioinform..

[114]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[115]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[116]  Sarah R. Edmonson,et al.  High-resolution serum proteomic patterns for ovarian cancer detection. , 2004, Endocrine-related cancer.

[117]  E. Keedwell,et al.  Modelling gene regulatory data using artificial neural networks , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[118]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[119]  Michael Q. Zhang,et al.  TRED: a transcriptional regulatory element database, new entries and other development , 2007, Nucleic Acids Res..

[120]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Group-Sparse Regularization , 2011, AISTATS.

[121]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[122]  Nikos Komodakis,et al.  Markov Random Field modeling, inference & learning in computer vision & image understanding: A survey , 2013, Comput. Vis. Image Underst..

[123]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[124]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[125]  Jian Peng,et al.  A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information , 2017, Nature Communications.

[126]  Jung Eun Shim,et al.  TRRUST: a reference database of human transcriptional regulatory interactions , 2015, Scientific Reports.

[127]  V. Frouin,et al.  Variable selection for generalized canonical correlation analysis. , 2014, Biostatistics.

[128]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[129]  Thomas Lumley,et al.  Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science) , 2018, Bioinformatics and biology insights.

[130]  H. Kashima,et al.  Kernels for graphs , 2004 .

[131]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[132]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[133]  Junwen Wang,et al.  Inferring gene regulatory networks by integrating ChIP-seq/chip and transcriptome data via LASSO-type regularization methods. , 2014, Methods.

[134]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[135]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[136]  M. Eileen Dolan,et al.  A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity , 2007, Proceedings of the National Academy of Sciences.

[137]  Ting Chen,et al.  Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[138]  Sach Mukherjee,et al.  Network inference using informative priors , 2008, Proceedings of the National Academy of Sciences.

[139]  Mansaf Alam,et al.  Recurrent neural network based hybrid model for reconstructing gene regulatory network , 2014, Comput. Biol. Chem..

[140]  Adrian E. Raftery,et al.  Integrating external biological knowledge in the construction of regulatory networks from time-series expression data , 2012, BMC Systems Biology.

[141]  Philippe Besse,et al.  Sparse canonical methods for biological data integration: application to a cross-platform study , 2009, BMC Bioinformatics.

[142]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[143]  E. Snitkin,et al.  Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network , 2009, Genome Biology.

[144]  Kumardeep Chaudhary,et al.  Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer , 2017, Clinical Cancer Research.

[145]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[146]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[147]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[148]  Feiping Nie,et al.  Predicting Protein-Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization , 2012, RECOMB.

[149]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[150]  Luciano Milanesi,et al.  Methods for the integration of multi-omics data: mathematical aspects , 2016, BMC Bioinformatics.

[151]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[152]  Vince D. Calhoun,et al.  Group sparse canonical correlation analysis for genomic data integration , 2013, BMC Bioinformatics.

[153]  Tim Beißbarth,et al.  Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer , 2011, BMC Bioinformatics.

[154]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[155]  Michele Ceccarelli,et al.  articleTimeDelay-ARACNE : Reverse engineering of gene networks from time-course data by an information theoretic approach , 2010 .

[156]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[157]  A. Barabasi,et al.  Network link prediction by global silencing of indirect correlations , 2013, Nature Biotechnology.

[158]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..