Learning stable and predictive structures in kinetic systems

Significance Many real-world systems can be described by a set of differential equations. Knowing these equations allows researchers to predict the system’s behavior under interventions, such as manipulations of initial or environmental conditions. For many complex systems, the differential equations are unknown. Deriving them by hand is infeasible for large systems, and data science is used to learn them from observational data. Existing techniques yield models that predict the observational data well, but fail to explain the effect of interventions. We propose an approach, CausalKinetiX, that explicitly takes into account stability across different experiments. This allows us to draw a more realistic picture of the system’s underlying causal structure and is a first step toward increasing reproducibility. Learning kinetic systems from data is one of the core challenges in many fields. Identifying stable models is essential for the generalization capabilities of data-driven inference. We introduce a computationally efficient framework, called CausalKinetiX, that identifies structure from discrete time, noisy observations, generated from heterogeneous experiments. The algorithm assumes the existence of an underlying, invariant kinetic model, a key criterion for reproducible research. Results on both simulated and real-world examples suggest that learning the structure of kinetic systems benefits from a causal perspective. The identified variables and models allow for a concise description of the dynamics across multiple experimental settings and can be used for prediction in unseen experiments. We observe significant improvements compared to well-established approaches focusing solely on predictive performance, especially for out-of-sample generalization.

[1]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[2]  Bin Yu,et al.  Estimation Stability With Cross-Validation (ESCV) , 2013, 1303.3128.

[3]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[4]  Fabian J. Theis,et al.  Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems , 2015, Bioinform..

[5]  Ali Shojaie,et al.  Network Reconstruction From High-Dimensional Ordinary Differential Equations , 2016, Journal of the American Statistical Association.

[6]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[7]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[8]  Steven L. Brunton,et al.  Data-driven discovery of partial differential equations , 2016, Science Advances.

[9]  Carol S. Woodward,et al.  Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation Solvers , 2020, ACM Trans. Math. Softw..

[10]  T. Haavelmo,et al.  The probability approach in econometrics , 1944 .

[11]  Joe W. Gray,et al.  Causal network inference using biochemical kinetics , 2014, Bioinform..

[12]  Wei-Bin Zhang Differential Equations, Bifurcations and Chaos in Economics , 2005 .

[13]  T. Brubaker,et al.  Nonlinear Parameter Estimation , 1979 .

[14]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[15]  P. Moin NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS , 2010 .

[16]  Bin Yu,et al.  Three principles of data science: predictability, computability, and stability (PCS) , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[17]  Kevin P. Murphy,et al.  Exact Bayesian structure learning from uncertain interventions , 2007, AISTATS.

[18]  Ting Chen,et al.  Modeling Gene Expression with Differential Equations , 1998, Pacific Symposium on Biocomputing.

[19]  S. Brunton,et al.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[20]  Paul Flicek,et al.  Identification of genetic elements in metabolism by high-throughput mouse phenotyping , 2018, Nature Communications.

[21]  Elias Bareinboim,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[22]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[23]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[24]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[25]  Judy Hall,et al.  The Book of Why , 2008 .

[26]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[27]  Karl J. Friston,et al.  Dynamic causal modelling , 2003, NeuroImage.

[28]  A. Danchin,et al.  Unique physiological and pathogenic features of Leptospira interrogans revealed by whole-genome sequencing , 2003, Nature.

[29]  Babatunde A. Ogunnaike,et al.  Process Dynamics, Modeling, and Control , 1994 .

[30]  J. Engelman,et al.  ERBB receptors: from oncogene discovery to basic science to mechanism-based cancer therapeutics. , 2014, Cancer cell.

[31]  Melanie I. Stefan,et al.  BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models , 2010, BMC Systems Biology.

[32]  Bernhard Schölkopf,et al.  From Ordinary Differential Equations to Structural Causal Models: the deterministic case , 2013, UAI.

[33]  Richard M Shiffrin,et al.  Drawing causal inference from Big Data , 2016, Proceedings of the National Academy of Sciences.

[34]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[35]  L. Shampine,et al.  Numerical Solution of Ordinary Differential Equations. , 1995 .

[36]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[37]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[38]  Jean-Baptiste Denis,et al.  Bayesian Networks , 2014 .

[39]  Niels Richard Hansen,et al.  Learning Large Scale Ordinary Differential Equation Systems , 2017, 1710.09308.

[40]  H. Schaeffer,et al.  Learning partial differential equations via data discovery and sparse optimization , 2017, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[41]  Julio R. Banga,et al.  Inference of complex biological networks: distinguishability issues and optimization-based solutions , 2011, BMC Systems Biology.

[42]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[43]  Feliks Nüske,et al.  Sparse learning of stochastic dynamical equations. , 2017, The Journal of chemical physics.

[44]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[45]  Walter Kolch,et al.  Signaling pathway models as biomarkers: Patient-specific simulations of JNK activity predict the survival of neuroblastoma patients , 2015, Science Signaling.

[46]  Jens Timmer,et al.  Predicting ligand-dependent tumors from multi-dimensional signaling features , 2017, npj Systems Biology and Applications.

[47]  James B. Brown,et al.  Iterative random forests to discover predictive and stable high-order interactions , 2017, Proceedings of the National Academy of Sciences.

[48]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[49]  Hulin Wu,et al.  Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling , 2014, Journal of the American Statistical Association.

[50]  Jiguo Cao,et al.  Parameter estimation for differential equations: a generalized smoothing approach , 2007 .

[51]  Peter Willett,et al.  What is a tutorial , 2013 .

[52]  Yonathan Bard,et al.  Nonlinear parameter estimation , 1974 .

[53]  L. Maillard,et al.  Action des acides amines sur les sucres : formation des melanoidines par voie methodique , 1912 .

[54]  Neil D. Lawrence,et al.  Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes , 2008, NIPS.

[55]  Bernhard Schölkopf,et al.  From Deterministic ODEs to Dynamic Structural Causal Models , 2016, UAI.

[56]  M. V. van Boekel,et al.  Kinetic modeling of reactions in heated monosaccharide-casein systems. , 2002, Journal of agricultural and food chemistry.

[57]  P. Meyer Probability and potentials , 1966 .

[58]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[59]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[60]  C. Blumberg Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[61]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[62]  Giang Tran,et al.  Exact Recovery of Chaotic Systems from Highly Corrupted Data , 2016, Multiscale Model. Simul..

[63]  M. Benson,et al.  Parameter fitting in dynamic models , 1979 .

[64]  N. Meinshausen,et al.  Methods for causal inference from gene perturbation experiments and validation , 2016, Proceedings of the National Academy of Sciences.