Causal discovery in the geosciences - Using synthetic data to learn how to interpret results

Causal discovery algorithms based on probabilistic graphical models have recently emerged in geoscience applications for the identification and visualization of dynamical processes. The key idea is to learn the structure of a graphical model from observed spatio-temporal data, thus finding pathways of interactions in the observed physical system. Studying those pathways allows geoscientists to learn subtle details about the underlying dynamical mechanisms governing our planet. Initial studies using this approach on real-world atmospheric data have shown great potential for scientific discovery. However, in these initial studies no ground truth was available, so that the resulting graphs have been evaluated only by whether a domain expert thinks they seemed physically plausible. The lack of ground truth is a typical problem when using causal discovery in the geosciences. Furthermore, while most of the connections found by this method match domain knowledge, we encountered one type of connection for which no explanation was found. To address both of these issues we developed a simulation framework that generates synthetic data of typical atmospheric processes (advection and diffusion). Applying the causal discovery algorithm to the synthetic data allowed us (1) to develop a better understanding of how these physical processes appear in the resulting connectivity graphs, and thus how to better interpret such connectivity graphs when obtained from real-world data; (2) to solve the mystery of the previously unexplained connections.

[1]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[2]  Paul J. Roebber,et al.  The architecture of the climate network , 2004 .

[3]  C. Glymour,et al.  Data Driven Methods for Nonlinear Granger Causality: Climate Teleconnection Mechanisms , 2005 .

[4]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[5]  Sergey Kravtsov,et al.  A new dynamical mechanism for major climate shifts , 2007 .

[6]  Yi Deng,et al.  Causal Discovery from Spatio-Temporal Data with Applications to Climate Science , 2014, 2014 13th International Conference on Machine Learning and Applications.

[7]  Enrique Bendito,et al.  Estimation of Fekete points , 2007, J. Comput. Phys..

[8]  Nitesh V. Chawla,et al.  Complex Networks In Climate Science: Progress, Opportunities And Challenges , 2010, CIDU.

[9]  R. Reynolds,et al.  The NCEP/NCAR 40-Year Reanalysis Project , 1996, Renewable Energy.

[10]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[11]  Jakob Runge,et al.  Detecting and quantifying causality from time series of complex systems , 2014 .

[12]  W. Collins,et al.  The NCEP–NCAR 50-Year Reanalysis: Monthly Means CD-ROM and Documentation , 2001 .

[13]  Yi Deng,et al.  Using Causal Discovery to Track Information Flow in Spatio-Temporal Data - A Testbed and Experimental Results Using Advection-Diffusion Simulations , 2015, ArXiv.

[14]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[15]  Norbert Marwan,et al.  The backbone of the climate network , 2009, 1002.2100.

[16]  I. Ebert‐Uphoff,et al.  HAT CAN WE LEARN ABOUT CLIMATE MODEL RUNS FROM THEIR CAUSAL SIGNATURES ? , 2015 .

[17]  Paul J. Roebber,et al.  What Do Networks Have to Do with Climate , 2006 .

[18]  I. Ebert‐Uphoff,et al.  A new type of climate network based on probabilistic graphical models: Results of boreal winter versus summer , 2012 .

[19]  S. Havlin,et al.  Climate networks around the globe are significantly affected by El Niño. , 2008, Physical review letters.

[20]  Jürgen Kurths,et al.  The backbone of climate networks , 2009 .

[21]  Jürgen Kurths,et al.  Networks from Flows - From Dynamics to Topology , 2014, Scientific Reports.

[22]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[23]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[24]  Imme Ebert-Uphoff,et al.  Weakening of atmospheric information flow in a warming climate in the Community Climate System Model , 2014 .

[25]  K. Lehnertz,et al.  A Gaussian graphical model approach to climate networks. , 2014, Chaos.

[26]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .