Learning from limited temporal data: Dynamically sparse historical functional linear models with applications to Earth science

Scientists and statisticians often want to learn about the complex relationships that connect two variables that vary over time. Recent work on sparse functional historical linear models confirms that they are promising for this purpose, but several notable limitations exist. Most importantly, previous works have imposed sparsity on the coefficient function, but have not allowed the sparsity, hence lag, to vary with time. We simplify the framework of sparse functional historical linear models by using a rectangular coefficient structure along with Whittaker smoothing, then relax the previous frameworks by estimating the dynamic time lag from a hierarchical coefficient structure. We motivate our study by aiming to extract the physical rainfall-runoff processes hidden within hydrological data. We show the promise and accuracy of our method using four simulation studies, justified by two real sets of hydrological data.

[1]  K. Podgórski Practical Smoothing: The Joys of P‐splinesPaul H. C.Eilers and Brian D.MarxCambridge University Press, 2021, xii + 199 pages, $59.99, hardcover ISBN: 978‐1‐1084‐8295‐0 , 2022, International Statistical Review.

[2]  B. Ruddell,et al.  Strength and Memory of Precipitation's Control Over Streamflow Across the Conterminous United States , 2022, Water Resources Research.

[3]  V. Radic,et al.  Evaluation and interpretation of convolutional long short-term memory networks for regional hydrological modelling , 2022, Hydrology and Earth System Sciences.

[4]  V. Radic,et al.  Assessment of Future Risks of Seasonal Municipal Water Shortages Across North America , 2021, Frontiers in Earth Science.

[5]  Maik Heistermann,et al.  The effect of calibration data length on the performance of a conceptual hydrological model versus LSTM and GRU: A case study for six basins from the CAMELS dataset , 2021, Comput. Geosci..

[6]  S. Hochreiter,et al.  MC-LSTM: Mass-Conserving LSTM , 2021, ICML.

[7]  N. Kunz Towards a broadened view of water security in mining regions , 2020 .

[8]  S. Vantini,et al.  Adaptive smoothing spline estimator for the function-on-function linear regression model , 2020, Computational Statistics.

[9]  Euan McTurk,et al.  Identification and machine learning prediction of knee-point and knee-onset in capacity degradation curves of lithium-ion cells , 2020 .

[10]  Jiguo Cao,et al.  Sparse estimation of historical functional linear models with a nested group bridge approach , 2019, Canadian Journal of Statistics.

[11]  N. Massei,et al.  Improving the Spectral Analysis of Hydrological Signals to Efficiently Constrain Watershed Properties , 2019, Water Resources Research.

[12]  Xuejun Wang,et al.  Estimating functions and derivatives via adaptive penalized splines , 2019, Commun. Stat. Simul. Comput..

[13]  Michela Gelfusa,et al.  On the Use of Transfer Entropy to Investigate the Time Horizon of Causal Influences between Signals , 2018, Entropy.

[14]  Sam Albers,et al.  tidyhydat: Extract and Tidy Canadian Hydrometric Data , 2017, J. Open Source Softw..

[15]  Martyn P. Clark,et al.  The CAMELS data set: catchment attributes and meteorology for large-sample studies , 2017 .

[16]  Yongmiao Hong,et al.  Adaptive penalized splines for data smoothing , 2017, Comput. Stat. Data Anal..

[17]  Richard Essery,et al.  Pursuing the method of multiple working hypotheses to understand differences in process-based snow models , 2017 .

[18]  J. McDonnell,et al.  Substantial proportion of global streamflow less than three months old , 2016 .

[19]  Paul H. C. Eilers,et al.  Twenty years of P-splines , 2015 .

[20]  Hugh A. Chipman,et al.  GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs , 2013, 1305.0759.

[21]  C. Luce Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places and Scales , 2014 .

[22]  A. Bowman,et al.  Distributed Lag Models for Hydrological Data , 2013, Biometrics.

[23]  Hosik Choi,et al.  Consistent Model Selection Criteria on High Dimensions , 2012, J. Mach. Learn. Res..

[24]  Dmitri Kavetski,et al.  Pursuing the method of multiple working hypotheses for hydrological modeling , 2011 .

[25]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[26]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[27]  Runze Li,et al.  Recent History Functional Linear Models for Sparse Longitudinal Data. , 2011, Journal of statistical planning and inference.

[28]  Damien Garcia Computational Statistics and Data Analysis Robust Smoothing of Gridded Data in One and Higher Dimensions with Missing Values , 2022 .

[29]  Enrico Bertuzzo,et al.  Transport in the hydrologic response: Travel time distributions, soil moisture dynamics, and the old water paradox , 2010 .

[30]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[31]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[32]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[33]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[34]  Jaroslaw Harezlak,et al.  Penalized solutions to functional regression problems , 2007, Comput. Stat. Data Anal..

[35]  X. Huo,et al.  When do stepwise algorithms meet subset selection criteria , 2007, 0708.2149.

[36]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[37]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[38]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[39]  J. Ramsay,et al.  The historical functional linear model , 2003 .

[40]  P. Eilers A perfect smoother. , 2003, Analytical chemistry.

[41]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[42]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Brian Seonghwa Lee,et al.  Seasonal Variability in a Distributed Lag Model , 1981 .

[44]  J. Tanner Variable Distributed Lags and Forecasting Non-Residential Construction , 1974 .

[45]  Phoebus J. Dhrymes,et al.  Distributed Lags:A Survey , 1972 .

[46]  J. E. Pesando Seasonal Variability in Distributed Lag Models , 1972 .

[47]  Chao-Lin Chiu,et al.  Nonlinear time varying model of rainfall-runoff relation. , 1969 .

[48]  Edmund Taylor Whittaker On a New Method of Graduation , 1922, Proceedings of the Edinburgh Mathematical Society.

[49]  P. Whitfield,et al.  EMDNA: Ensemble Meteorological Dataset for North America , 2020 .

[50]  I. Ntzoufras,et al.  Adaptive subspace methods for high-dimensional variable selection , 2018 .

[51]  Sue Vink,et al.  Mine water management in variable climate regimes , 2012 .

[52]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[53]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[54]  F. Takahashi On the Distributed Lag Investment Function , 1973 .

[55]  Shirley Almon The Distributed Lag Between Capital Appropriations and Expenditures , 1965 .