Inference of Gene Regulatory Networks Based on a Universal Minimum Description Length

The Boolean network paradigm is a simple and effective way to interpret genomic systems, but discovering the structure of these networks remains a difficult task. The minimum description length (MDL) principle has already been used for inferring genetic regulatory networks from time-series expression data and has proven useful for recovering the directed connections in Boolean networks. However, the existing method uses an ad hoc measure of description length that necessitates a tuning parameter for artificially balancing the model and error costs and, as a result, directly conflicts with the MDL principle's implied universality. In order to surpass this difficulty, we propose a novel MDL-based method in which the description length is a theoretical measure derived from a universal normalized maximum likelihood model. The search space is reduced by applying an implementable analogue of Kolmogorov's structure function. The performance of the proposed method is demonstrated on random synthetic networks, for which it is shown to improve upon previously published network inference algorithms with respect to both speed and accuracy. Finally, it is applied to time-series Drosophila gene expression measurements.

[1]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[2]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[3]  M. Fujioka,et al.  Early even-skipped stripes act as morphogenetic gradients at the single cell level to establish engrailed expression. , 1995, Development.

[4]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[7]  Ilya Nemenman Information theory, multivariate dependence, and genetic network inference , 2004, ArXiv.

[8]  S. Kauffman Metabolic stability and epigenesis in randomly constructed genetic nets. , 1969, Journal of theoretical biology.

[9]  Jorma Rissanen,et al.  Information and Complexity in Statistical Modeling , 2006, ITW.

[10]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[11]  Shigeru Morimura,et al.  decapentaplegicOverexpression AffectsDrosophilaWing and Leg Imaginal Disc Development andwinglessExpression , 1996 .

[12]  Satoru Miyano,et al.  Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model , 1998, Pacific Symposium on Biocomputing.

[13]  S. Kauffman,et al.  Activities and sensitivities in boolean network models. , 2004, Physical review letters.

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  Edward R. Dougherty,et al.  Inferring gene regulatory networks from time series data using the minimum description length principle , 2006, Bioinform..

[16]  H. Krause,et al.  Dynamic changes in the functions of Odd-skipped during early Drosophila embryogenesis. , 1998, Development.

[17]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[18]  K. Arora,et al.  The transcription factor Schnurri plays a dual role in mediating Dpp signaling during embryogenesis. , 2001, Development.

[19]  Ilya Shmulevich,et al.  On Learning Gene Regulatory Networks Under the Boolean Network Model , 2003, Machine Learning.

[20]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[21]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[22]  F. Hoffmann,et al.  decapentaplegic overexpression affects Drosophila wing and leg imaginal disc development and wingless expression. , 1996, Developmental biology.

[23]  M. Scott,et al.  Role of the teashirt gene in Drosophila midgut morphogenesis: secreted proteins mediate the action of homeotic genes. , 1994, Development.

[24]  A Wuensche,et al.  Genomic regulation modeled as a network with basins of attraction. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[25]  PaÈr Steneberg,et al.  PII: S0925-4773(99)00157-4 , 1999 .

[26]  Hernán López-Schier,et al.  A Notch/Delta-dependent relay mechanism establishes anterior-posterior polarity in Drosophila. , 2003, Developmental cell.

[27]  Ilya Shmulevich,et al.  Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networks , 2006, Signal Process..

[28]  C. Desplan,et al.  High bicoid levels render the terminal system dispensable for Drosophila head development. , 2000, Development.

[29]  H. Jäckle,et al.  Invagination centers within the Drosophila stomatogastric nervous system anlage are positioned by Notch-mediated signaling which is spatially controlled through wingless. , 1995, Development.

[30]  Stuart A. Kauffman,et al.  ORIGINS OF ORDER , 2019, Origins of Order.

[31]  D M Cimbora,et al.  Drosophila midgut morphogenesis requires the function of the segmentation gene odd-paired. , 1995, Developmental biology.

[32]  Jyoti Bhojwani,et al.  Requirement of teashirt (tsh) function during cell fate specification in developing head structures in Drosophila , 1997, Development Genes and Evolution.

[33]  J. Fak,et al.  Transcriptional Control in the Segmentation Gene Network of Drosophila , 2004, PLoS biology.

[34]  Jaakko Astola,et al.  Inference of Genetic Regulatory Networks via Best-Fit Extensions , 2003 .

[35]  Jaakko Astola,et al.  On the Use of MDL Principle in Gene Expression Prediction , 2001, EURASIP J. Adv. Signal Process..

[36]  M. Frasch,et al.  Regulation of the twist target gene tinman by modular cis-regulatory elements during early mesoderm development. , 1997, Development.

[37]  J. Rissanen,et al.  Normalized Maximum Likelihood Models for Boolean Regression with Application to Prediction and Classification in Genomics , 2003 .

[38]  Aniruddha Datta,et al.  Generating Boolean networks with a prescribed attractor structure , 2005, Bioinform..

[39]  Andrew Wuensche,et al.  A model of transcriptional regulatory networks based on biases in the observed regulation rules , 2002, Complex..

[40]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[41]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[42]  Xiaobo Zhou,et al.  A Bayesian connectivity-based approach to constructing probabilistic gene regulatory networks , 2004, Bioinform..

[43]  Araceli M. Huerta,et al.  From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli. , 1998, BioEssays : news and reviews in molecular, cellular and developmental biology.

[44]  B. Derrida,et al.  Random networks of automata: a simple annealed approximation , 1986 .

[45]  W. Szpankowski ON ASYMPTOTICS OF CERTAIN RECURRENCES ARISING IN UNIVERSAL CODING , 1998 .