Inferring gene regulatory networks from time series data using the minimum description length principle

MOTIVATION A central question in reverse engineering of genetic networks consists in determining the dependencies and regulating relationships among genes. This paper addresses the problem of inferring genetic regulatory networks from time-series gene-expression profiles. By adopting a probabilistic modeling framework compatible with the family of models represented by dynamic Bayesian networks and probabilistic Boolean networks, this paper proposes a network inference algorithm to recover not only the direct gene connectivity but also the regulating orientations. RESULTS Based on the minimum description length principle, a novel network inference algorithm is proposed that greatly shrinks the search space for graphical solutions and achieves a good trade-off between modeling complexity and data fitting. Simulation results show that the algorithm achieves good performance in the case of synthetic networks. Compared with existing state-of-the-art results in the literature, the proposed algorithm exceptionally excels in efficiency, accuracy, robustness and scalability. Given a time-series dataset for Drosophila melanogaster, the paper proposes a genetic regulatory network involved in Drosophila's muscle development. AVAILABILITY Available from the authors upon request.

[1]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[2]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[3]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[4]  Stefano Panzeri,et al.  The Upward Bias in Measures of Information Derived from Limited Data Samples , 1995, Neural Computation.

[5]  Ilya Nemenman Information theory, multivariate dependence, and genetic network inference , 2004, ArXiv.

[6]  Jaakko Astola,et al.  On the Use of MDL Principle in Gene Expression Prediction , 2001, EURASIP J. Adv. Signal Process..

[7]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[9]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[10]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[11]  Ilya Shmulevich,et al.  Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks , 2006, Signal Process..

[12]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[13]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[14]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Christian Dahmann,et al.  Extrusion of Cells with Inappropriate Dpp Signaling from Drosophila Wing Disc Epithelia , 2005, Science.

[17]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[18]  Liam Paninski,et al.  Estimating entropy on m bins given fewer than m samples , 2004, IEEE Transactions on Information Theory.

[19]  DeanThomas,et al.  A model for reasoning about persistence and causation , 1989 .

[20]  M. Frasch,et al.  Nuclear integration of positive Dpp signals, antagonistic Wg inputs and mesodermal competence factors during Drosophila visceral mesoderm induction , 2005, Development.

[21]  Edward R. Dougherty,et al.  From Boolean to probabilistic Boolean networks as models of genetic regulatory networks , 2002, Proc. IEEE.

[22]  E. Salmon Gene Expression During the Life Cycle of Drosophila melanogaster , 2002 .

[23]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[24]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .