An empirical comparison of popular structure learning algorithms with a view to gene network inference

Abstract In this work, we study the performance of different structure learning algorithms in the context of inferring gene networks from transcription data. We consider representatives of different structure learning approaches, some of which perform unrestricted searches, such as the PC algorithm and the Gobnilp method, and some of which introduce prior information on the structure, such as the K2 algorithm. Competing methods are evaluated both in terms of their predictive accuracy and their ability to reconstruct the true underlying network. A real data application based on an experiment performed by the University of Padova is also considered.

[1]  Kathleen Marchal,et al.  A generator of biologically plausible synthetic gene expression data for design and analysis of structure learning algorithms , 2005 .

[2]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[3]  David Heckerman,et al.  A New Look at Causal Independence , 1994, UAI.

[4]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[5]  Vera Djordjilovic Graphical modelling of biological pathways , 2015 .

[6]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[7]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[8]  Aurélien Mazurie,et al.  Gene networks inference using dynamic Bayesian networks , 2003, ECCB.

[9]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[10]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[11]  Lin Gao,et al.  ppiPre: predicting protein-protein interactions by combining heterogeneous features , 2013, BMC Systems Biology.

[12]  Marek J. Druzdzel,et al.  Evaluation of Rules for Coping with Insufficient Data in Constraint-Based Search Algorithms , 2014, Probabilistic Graphical Models.

[13]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[14]  Grégory Nuel,et al.  Joint estimation of causal effects from observational and intervention gene expression data , 2013, BMC Systems Biology.

[15]  Marek J Druzdzel,et al.  Canonical Probabilistic Models for Knowledge Engineering , 2007 .

[16]  Satoru Miyano,et al.  Inferring gene networks from time series microarray data using dynamic Bayesian networks , 2003, Briefings Bioinform..

[17]  Anders L. Madsen,et al.  The Hugin Tool for Probabilistic Graphical Models , 2005, Int. J. Artif. Intell. Tools.

[18]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[19]  Galina V. Glazko,et al.  Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data , 2012, Front. Gene..

[20]  A. Hartemink Reverse engineering gene regulatory networks , 2005, Nature Biotechnology.

[21]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[22]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[23]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[24]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[25]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[26]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[27]  Milan Studený,et al.  Graphical and Algebraic Representatives of Conditional Independence Models , 2007 .

[28]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[29]  Monica Chiogna,et al.  simPATHy: a new method for simulating data from perturbed biological PATHways , 2016, Bioinform..

[30]  Tao Chen,et al.  Gene Expression Profiles Distinguish the Carcinogenic Effects of Aristolochic Acid in Target (Kidney) and Non-target (Liver) Tissues in Rats , 2006, BMC Bioinformatics.

[31]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.