Estimating a Latent Tree for Extremes

The Latent River Problem has emerged as a flagship problem for causal discovery in extreme value statistics. This paper gives QTree , a simple and efficient algorithm to solve the Latent River Problem that outperforms existing methods. QTree returns a directed graph and achieves almost perfect recovery on the Upper Danube, the existing benchmark dataset, as well as on new data from the Lower Colorado River in Texas. It can handle missing data, has an automated parameter tuning procedure, and runs in time O(n|V |2), where n is the number of observations and |V | the number of nodes in the graph. In addition, under a Bayesian network model for extreme values with propagating noise, we show that the QTree estimator returns for n → ∞ a.s. the correct tree.

[1]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[2]  S. Resnick Heavy-Tail Phenomena: Probabilistic and Statistical Modeling , 2006 .

[3]  Claudia Klüppelberg,et al.  Estimating an extreme Bayesian network via scalings , 2019, J. Multivar. Anal..

[4]  Robert E. Tarjan,et al.  Efficient algorithms for finding minimum spanning trees in undirected and directed graphs , 1986, Comb..

[5]  Brandon P. Wong,et al.  Open storm: a complete framework for sensing and control of urban watersheds , 2017, ArXiv.

[6]  Anthony C. Davison,et al.  Extremes on river networks , 2015, 1501.02663.

[7]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[8]  Andreas Ritter,et al.  Structural Equations With Latent Variables , 2016 .

[9]  C. Kluppelberg,et al.  Max-linear models on directed acyclic graphs , 2015, Bernoulli.

[10]  W. Kinzelbach Applied groundwater modeling — Simulation of flow and advective transport , 1992 .

[11]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[12]  Christian Zwiener,et al.  Tracking artificial sweeteners and pharmaceuticals introduced into urban groundwater by leaking sewer networks. , 2012, The Science of the total environment.

[13]  Scott J. McGrane,et al.  Impacts of urbanisation on hydrological and water quality dynamics, and urban water management: a review , 2016 .

[14]  J. Rochet,et al.  Interbank Lending and Systemic Risk , 1996 .

[15]  Geert Jan Olsder,et al.  Synchronization and Linearity: An Algebra for Discrete Event Systems , 1994 .

[16]  S. Lauritzen,et al.  Bayesian Networks for Max-Linear Models , 2019, Network Science.

[17]  Kerrie Mengersen,et al.  A framework for automated anomaly detection in high frequency water-quality data from in situ sensors. , 2018, The Science of the total environment.

[18]  Jonas Peters,et al.  Causal discovery in heavy-tailed models , 2019, The Annals of Statistics.

[19]  Johan Segers,et al.  A continuous updating weighted least squares estimator of tail dependence in high dimensions , 2016, Extremes.

[20]  J. Hoef,et al.  Spatial statistical models that use flow and stream distance , 2006, Environmental and Ecological Statistics.

[21]  V. Chavez-Demoulin,et al.  Causal mechanism of extreme river discharges in the upper Danube basin network , 2019, Journal of the Royal Statistical Society: Series C (Applied Statistics).

[22]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[23]  S. Lauritzen,et al.  Identifiability and estimation of recursive max‐linear models , 2019, Scandinavian Journal of Statistics.

[24]  M. Hofri,et al.  The coupon-collector problem revisited — a survey of engineering problems and computational methods , 1997 .

[25]  V. Papanicolaou,et al.  General asymptotic estimates for the coupon collector problem , 1996 .

[26]  Erin E. Peterson,et al.  A Moving Average Approach for Spatial Statistical Models of Stream Networks , 2010 .

[27]  Claudia Kluppelberg,et al.  Recursive max-linear models with propagating noise , 2020, Electronic Journal of Statistics.

[28]  Claudia Klüppelberg,et al.  Densities with Gaussian Tails , 1993 .

[29]  Kerrie Mengersen,et al.  Detecting technical anomalies in high-frequency water-quality data using Artificial Neural Networks. , 2020, Environmental science & technology.

[30]  Sebastian Engelke,et al.  Graphical models for extremes , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[31]  L. Haan,et al.  Extreme value theory : an introduction , 2006 .

[32]  Feng Mao,et al.  Low-Cost Environmental Sensor Networks: Recent Advances and Future Directions , 2019, Front. Earth Sci..