ENNET: inferring large gene regulatory networks from expression data using gradient boosting

BackgroundThe regulation of gene expression by transcription factors is a key determinant of cellular phenotypes. Deciphering genome-wide networks that capture which transcription factors regulate which genes is one of the major efforts towards understanding and accurate modeling of living systems. However, reverse-engineering the network from gene expression profiles remains a challenge, because the data are noisy, high dimensional and sparse, and the regulation is often obscured by indirect connections.ResultsWe introduce a gene regulatory network inference algorithm ENNET, which reverse-engineers networks of transcriptional regulation from a variety of expression profiles with a superior accuracy compared to the state-of-the-art methods. The proposed method relies on the boosting of regression stumps combined with a relative variable importance measure for the initial scoring of transcription factors with respect to each gene. Then, we propose a technique for using a distribution of the initial scores and information about knockouts to refine the predictions. We evaluated the proposed method on the DREAM3, DREAM4 and DREAM5 data sets and achieved higher accuracy than the winners of those competitions and other established methods.ConclusionsSuperior accuracy achieved on the three different benchmark data sets shows that ENNET is a top contender in the task of network inference. It is a versatile method that uses information about which gene was knocked-out in which experiment if it is available, but remains the top performer even without such information. ENNET is available for download from https://github.com/slawekj/ennet under the GNU GPLv3 license.

[1]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  Satoru Miyano,et al.  Inferring gene networks from time series microarray data using dynamic Bayesian networks , 2003, Briefings Bioinform..

[4]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[5]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[6]  Wei-Po Lee,et al.  Computational methods for discovering gene networks from expression data , 2009, Briefings Bioinform..

[7]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[8]  Tomasz Arodz,et al.  ADANET: inferring gene regulatory networks using ensemble classifiers , 2012, BCB.

[9]  Frank Emmert-Streib,et al.  Inferring the conservative causal core of gene regulatory networks , 2010, BMC Systems Biology.

[10]  Ralf Herwig,et al.  GeNGe: systematic generation of gene regulatory networks , 2009, Bioinform..

[11]  Paul P. Wang,et al.  Advances to Bayesian network inference for generating causal networks from observational biological data , 2004, Bioinform..

[12]  Ralf Zimmer,et al.  Inferring gene regulatory networks by ANOVA , 2012, Bioinform..

[13]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[14]  Timothy S Gardner,et al.  Reverse-engineering transcription control networks. , 2005, Physics of life reviews.

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  N. D. Clarke,et al.  Correction: Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PLoS ONE.

[17]  Aurélien Mazurie,et al.  Gene networks inference using dynamic Bayesian networks , 2003, ECCB.

[18]  Xing-Ming Zhao,et al.  NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference , 2013, Bioinform..

[19]  Florence d'Alché-Buc,et al.  OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks , 2013, Bioinform..

[20]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[21]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[22]  Kevin Y. Yip,et al.  Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data , 2010, PloS one.

[23]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[24]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[25]  George C. Runger,et al.  Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  Julio Collado-Vides,et al.  RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units) , 2010, Nucleic Acids Res..

[28]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[29]  Patrik D'haeseleer,et al.  Linear Modeling of mRNA Expression Levels During CNS Development and Injury , 1998, Pacific Symposium on Biocomputing.

[30]  Margaret Werner-Washburne,et al.  A system for generating transcription regulatory networks with combinatorial control of transcription , 2008, Bioinform..

[31]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[32]  F. Doyle,et al.  A benchmark for methods in reverse engineering and model discrimination: problem formulation and solutions. , 2004, Genome research.

[33]  M. Reinders,et al.  Genetic network modeling. , 2002, Pharmacogenomics.

[34]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[35]  J. Roach,et al.  Statistical analysis of MPSS measurements: application to the study of LPS-activated macrophage gene expression. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[37]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[38]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[39]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[40]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[41]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[42]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[43]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[44]  Michael R. Brent,et al.  Benchmarking regulatory network reconstruction with GRENDEL , 2009, Bioinform..

[45]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[46]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[47]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[48]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[49]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[50]  Tom Michoel,et al.  Context-specific transcriptional regulatory network inference from global gene expression maps using double two-way t-tests , 2012, Bioinform..

[51]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[52]  Richard Bonneau,et al.  Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks , 2013, Bioinform..

[53]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[54]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[55]  Claudio Cobelli,et al.  A Gene Network Simulator to Assess Reverse Engineering Algorithms , 2009, Annals of the New York Academy of Sciences.

[56]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[57]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.