Synthesis of biological models from mutation experiments

Executable biology presents new challenges to formal methods. This paper addresses two problems that cell biologists face when developing formally analyzable models. First, we show how to automatically synthesize a concurrent in-silico model for cell development given in-vivo experiments of how particular mutations influence the experiment outcome. The problem of synthesis under mutations is unique because mutations may produce non-deterministic outcomes (presumably by introducing races between competing signaling pathways in the cells) and the synthesized model must be able to replay all these outcomes in order to faithfully describe the modeled cellular processes. In contrast, a "regular" concurrent program is correct if it picks any outcome allowed by the non-deterministic specification. We developed synthesis algorithms and synthesized a model of cell fate determination of the earthworm C. elegans. A version of this model previously took systems biologists months to develop. Second, we address the problem of under-constrained specifications that arise due to incomplete sets of mutation experiments. Under-constrained specifications give rise to distinct models, each explaining the same phenomenon differently. Addressing the ambiguity of specifications corresponds to analyzing the space of plausible models. We develop algorithms for detecting ambiguity in specifications, i.e., whether there exist alternative models that would produce different fates on some unperformed experiment, and for removing redundancy from specifications, i.e., computing minimal non-ambiguous specifications. Additionally, we develop a modeling language and embed it into Scala. We describe how this language design and embedding allows us to build an efficient synthesizer. For our C. elegans case study, we infer two observationally equivalent models expressing different biological hypotheses through different protein interactions. One of these hypotheses was previously unknown to biologists.

[1]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[2]  Calin Belta,et al.  Temporal Logic Analysis of Gene Networks Under Parameter Uncertainty , 2008, IEEE Transactions on Automatic Control.

[3]  Armando Solar-Lezama,et al.  Sketching concurrent data structures , 2008, PLDI '08.

[4]  Martin Odersky,et al.  Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition , 2010 .

[5]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[6]  Aviv Regev,et al.  The π-calculus as an Abstraction for Biomolecular Systems , 2004 .

[7]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[8]  Chris J. Myers,et al.  Learning Genetic Regulatory Network Connectivity from Time Series Data , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Thomas A. Henzinger,et al.  Biology as reactivity , 2011, Commun. ACM.

[10]  François Fages,et al.  Continuous valuations of temporal logic specifications with applications to parameter optimization and robustness measures , 2011, Theor. Comput. Sci..

[11]  Amir Pnueli,et al.  Formal Modeling of C. elegans Development: A Scenario-Based Approach , 2003, CMSB.

[12]  Iva Greenwald,et al.  Crosstalk Between the EGFR and LIN-12/Notch Pathways in C. elegans Vulval Development , 2004, Science.

[13]  GulwaniSumit Automating string processing in spreadsheets using input-output examples , 2011 .

[14]  Marta Z. Kwiatkowska,et al.  Probabilistic model checking of complex biological pathways , 2008, Theor. Comput. Sci..

[15]  Thomas A. Henzinger,et al.  Bounded Asynchrony: Concurrency for Modeling Cell-Cell Interactions , 2008, FMSB.

[16]  Viktor Kuncak,et al.  Scala to the Power of Z3: Integrating SMT and Programming , 2011, CADE.

[17]  Anna Philippou,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.

[18]  Radu Mateescu,et al.  Analysis and Verification of Qualitative Models of Genetic Regulatory Networks: A Model-Checking Approach , 2005, IJCAI.

[19]  David L. Dill Model Checking Cell Biology , 2012, CAV.

[20]  Eran Yahav,et al.  Deriving linearizable fine-grained concurrent objects , 2008, PLDI '08.

[21]  James Brown,et al.  Nonparametric identification of regulatory interactions from spatial and temporal gene expression data , 2010, BMC Bioinformatics.

[22]  David Harel,et al.  The immune system as a reactive system: modeling T cell activation with statecharts , 2001, Proceedings IEEE Symposia on Human-Centric Computing Languages and Environments (Cat. No.01TH8587).

[23]  Thomas A. Henzinger,et al.  Reactive Modules , 1999, Formal Methods Syst. Des..

[24]  Thomas A. Henzinger,et al.  Predictive Modeling of Signaling Crosstalk during C. elegans Vulval Development , 2007, PLoS Comput. Biol..

[25]  A. Arkin,et al.  Stochastic kinetic analysis of developmental pathway bifurcation in phage lambda-infected Escherichia coli cells. , 1998, Genetics.

[26]  T. Henzinger,et al.  Executable cell biology , 2007, Nature Biotechnology.

[27]  Sumit Gulwani,et al.  From program verification to program synthesis , 2010, POPL '10.

[28]  François Fages,et al.  Symbolic Model Checking of Biochemical Networks , 2003, CMSB.

[29]  Claudine Chaouiya,et al.  Petri net modelling of biological networks , 2007, Briefings Bioinform..

[30]  Mark A. Miller,et al.  A Dynamic Model of Guard Cell Abscisic Acid Signaling , 2006 .

[31]  A. Arkin,et al.  Stochastic mechanisms in gene expression. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Vincent Danos,et al.  Abstract Interpretation of Cellular Signalling Networks , 2008, VMCAI.