Equation discovery for systems biology: finding the structure and dynamics of biological networks from time course data.

Reconstructing biological networks, such as metabolic and signaling networks, is at the heart of systems biology. Although many approaches exist for reconstructing network structure, few approaches recover the full dynamic behavior of a network. We survey such approaches that originate from computational scientific discovery, a subfield of machine learning. These take as input measured time course data, as well as existing domain knowledge, such as partial knowledge of the network structure. We demonstrate the use of these approaches on illustrative tasks of finding the complete dynamics of biological networks, which include examples of rediscovering known networks and their dynamics, as well as examples of proposing models for unknown networks.

[1]  A. S. Torralba,et al.  Experimental test of a method for determining causal connectivities of species in reactions , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Pat Langley,et al.  Inducing Hierarchical Process Models in Dynamic Domains , 2005, AAAI.

[3]  E D Sontag For Differential Equations with r Parameters, 2r+1 Experiments Are Enough for Identification , 2003, J. Nonlinear Sci..

[4]  Ivan Bratko,et al.  Discovery of Genetic Networks Through Abduction and Qualitative Simulation , 2007, Computational Discovery of Scientific Knowledge.

[5]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[6]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[7]  Roy E. Welsch,et al.  Algorithm 717: Subroutines for maximum likelihood and quasi-likelihood estimation of parameters in nonlinear regression models , 1993, TOMS.

[8]  Saso Dzeroski,et al.  Using Domain Specific Knowledge for Automated Modeling , 2003, IDA.

[9]  Sašo Džeroski,et al.  Encoding and Using Domain Knowledge on Population Dynamics for Equation Discovery , 2002 .

[10]  Kathy Chen,et al.  Network dynamics and cell physiology , 2001, Nature Reviews Molecular Cell Biology.

[11]  P Gennemark,et al.  Efficient algorithms for ordinary differential equation model identification of biological systems. , 2007, IET systems biology.

[12]  N. Price,et al.  Biochemical and statistical network models for systems biology. , 2007, Current opinion in biotechnology.

[13]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[14]  Saso Dzeroski,et al.  Integrating Knowledge-Driven and Data-Driven Approaches to Modeling , 2006, EnviroInfo.

[15]  Adam P. Arkin,et al.  Statistical Construction of Chemical Reaction Mechanisms from Measured Time-Series , 1995 .

[16]  Herbert A. Simon,et al.  Scientific discovery , 1993, BMJ : British Medical Journal.

[17]  Friedrich Recknagel,et al.  Automated modelling of a food web in lake Bled using measured data and a library of domain knowledge , 2006 .

[18]  Saso Dzeroski,et al.  A Minimal Description Length Scheme for Polynomial Regression , 2008, PAKDD.

[19]  John E. Dennis,et al.  Algorithm 573: NL2SOL—An Adaptive Nonlinear Least-Squares Algorithm [E4] , 1981, TOMS.

[20]  Pat Langley,et al.  Constructing explanatory process models from biological data and knowledge , 2006, Artif. Intell. Medicine.

[21]  Saso Dzeroski,et al.  Inducing Process Models from Continuous Data , 2002, ICML.

[22]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[23]  Ljupco Todorovski Inductive Process Modeling , 2010, Encyclopedia of Machine Learning.

[24]  M. Barenco,et al.  Fitting ordinary differential equations to short time course data , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[25]  Jesfis Peral,et al.  Heuristics -- intelligent search strategies for computer problem solving , 1984 .

[26]  Ashwin Srinivasan,et al.  Learning Qualitative Models of Physical and Biological Systems , 2007, Computational Discovery of Scientific Knowledge.

[27]  Saso Dzeroski,et al.  Declarative Bias in Equation Discovery , 1997, ICML.

[28]  John R. Koza,et al.  Automatic Computational Discovery of Chemical Reaction Networks Using Genetic Programming , 2007, Computational Discovery of Scientific Knowledge.

[29]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[30]  Saso Dzeroski,et al.  Inducing Polynomial Equations for Regression , 2004, ECML.

[31]  S. Džeroski,et al.  Multitarget Polynomial Regression with Constraints , 2007 .

[32]  Masaru Tomita,et al.  Dynamic modeling of genetic networks using genetic algorithm and S-system , 2003, Bioinform..

[33]  Lorenzo Magnani,et al.  Logical and Computational Aspects of Model-Based Reasoning , 2002 .

[34]  Saso Dzeroski,et al.  Computational Discovery of Scientific Knowledge , 2007, Computational Discovery of Scientific Knowledge.

[35]  Sago Deroski,et al.  Discovering Dynamics: From Inductive Logic Programming To Machine Discovery , 2002 .

[36]  Saso Dzeroski,et al.  Using Constraints in Discovering Dynamics , 2003, Discovery Science.

[37]  Alexandra V. Pokhilko,et al.  Computational Model Explains High Activity and Rapid Cycling of Rho GTPases within Protein Complexes , 2006, PLoS Comput. Biol..

[38]  Eduardo D. Sontag,et al.  Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data , 2004, Bioinform..