Causal Discovery of a River Network from its Extremes

Causal inference for extremes aims to discover cause and effect relations between large observed values of random variables. Over the last years, a number of methods have been proposed for solving the Hidden River Problem, with the Danube data set as benchmark. In this paper, we provide QTree , a new and simple algorithm to solve the Hidden River Problem that outperforms existing methods. QTree returns a directed graph and achieves almost perfect recovery on the Danube as well as on new data from the Lower Colorado River. It can handle missing data, has an automated parameter tuning procedure, and runs in time O(n|V |), where n is the number of observations and |V | the number of nodes in the graph. QTree relies on qualitative aspects of the max-linear Bayesian network model.

[1]  Jonas Peters,et al.  Causal discovery in heavy-tailed models , 2019, The Annals of Statistics.

[2]  David R. Musser,et al.  Introspective Sorting and Selection Algorithms , 1997, Softw. Pract. Exp..

[3]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[4]  S. Lauritzen,et al.  Identifiability and estimation of recursive max‐linear models , 2019, Scandinavian Journal of Statistics.

[5]  J. Pearl,et al.  Causal Counterfactual Theory for the Attribution of Weather and Climate-Related Events , 2016 .

[6]  Sebastian Engelke,et al.  Graphical models for extremes , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[7]  Anthony C. Davison,et al.  Extremes on river networks , 2015, 1501.02663.

[8]  Feng Mao,et al.  Low-Cost Environmental Sensor Networks: Recent Advances and Future Directions , 2019, Front. Earth Sci..

[9]  C. Kluppelberg,et al.  Tail dependence of recursive max-linear models with regularly varying noise variables , 2017, 1701.07351.

[10]  Claudia Klüppelberg,et al.  Estimating an extreme Bayesian network via scalings , 2019, J. Multivar. Anal..

[11]  W. Kinzelbach Applied groundwater modeling — Simulation of flow and advective transport , 1992 .

[12]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[13]  Robert E. Tarjan,et al.  Efficient algorithms for finding minimum spanning trees in undirected and directed graphs , 1986, Comb..

[14]  S. Lauritzen,et al.  Bayesian Networks for Max-Linear Models , 2019, Network Science.

[15]  Philippe Naveau,et al.  Probabilities of Causation of Climate Changes , 2017, Journal of Climate.

[16]  Scott J. McGrane,et al.  Impacts of urbanisation on hydrological and water quality dynamics, and urban water management: a review , 2016 .

[17]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[18]  J. Hoef,et al.  Spatial statistical models that use flow and stream distance , 2006, Environmental and Ecological Statistics.

[19]  C. Kluppelberg,et al.  Max-linear models on directed acyclic graphs , 2015, Bernoulli.

[20]  Kerrie Mengersen,et al.  Detecting technical anomalies in high-frequency water-quality data using Artificial Neural Networks. , 2020, Environmental science & technology.

[21]  P. Naveau,et al.  Climate extreme event attribution using multivariate peaks-over-thresholds modeling and counterfactual theory , 2019, 1908.03107.

[22]  Christian Zwiener,et al.  Tracking artificial sweeteners and pharmaceuticals introduced into urban groundwater by leaking sewer networks. , 2012, The Science of the total environment.

[23]  Brandon P. Wong,et al.  Open storm: a complete framework for sensing and control of urban watersheds , 2017, ArXiv.

[24]  Michael I. Jordan Graphical Models , 1998 .

[25]  V. Papanicolaou,et al.  General asymptotic estimates for the coupon collector problem , 1996 .

[26]  Kerrie Mengersen,et al.  A framework for automated anomaly detection in high frequency water-quality data from in situ sensors. , 2018, The Science of the total environment.

[27]  M. Hofri,et al.  The coupon-collector problem revisited — a survey of engineering problems and computational methods , 1997 .

[28]  Erin E. Peterson,et al.  A Moving Average Approach for Spatial Statistical Models of Stream Networks , 2010 .

[29]  Claudia Kluppelberg,et al.  Recursive max-linear models with propagating noise , 2020, Electronic Journal of Statistics.

[30]  V. Chavez-Demoulin,et al.  Causal mechanism of extreme river discharges in the upper Danube basin network , 2019, Journal of the Royal Statistical Society: Series C (Applied Statistics).