Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods

The forecasting of monthly seasonal streamflow time series is an important issue for countries where hydroelectric plants contribute significantly to electric power generation. The main step in the planning of the electric sector’s operation is to predict such series to anticipate behaviors and issues. In general, several proposals of the literature focus just on the determination of the best forecasting models. However, the correct selection of input variables is an essential step for the forecasting accuracy, which in a univariate model is given by the lags of the time series to forecast. This task can be solved by variable selection methods since the performance of the predictors is directly related to this stage. In the present study, we investigate the performances of linear and non-linear filters, wrappers, and bio-inspired metaheuristics, totaling ten approaches. The addressed predictors are the extreme learning machine neural networks, representing the non-linear approaches, and the autoregressive linear models, from the Box and Jenkins methodology. The computational results regarding five series from hydroelectric plants indicate that the wrapper methodology is adequate for the non-linear method, and the linear approaches are better adjusted using filters.

[1]  Yiping Du,et al.  A spectra partition algorithm based on spectral clustering for interval variable selection , 2020 .

[2]  José Manuel Amigo,et al.  Potential of VIS-NIR hyperspectral imaging and chemometric methods to identify similar cultivars of nectarine , 2018 .

[3]  Ren Yi,et al.  Comparison of variable selection algorithms on vis-NIR hyperspectral imaging spectra for quantitative monitoring and visualization of bacterial foodborne pathogens in fresh pork muscles , 2020, Infrared Physics & Technology.

[4]  X. Cui,et al.  Chaotic Time Series Prediction Based On Binary Particle Swarm Optimization , 2012 .

[5]  Jianzhong Zhou,et al.  Streamflow estimation by support vector machine coupled with different methods of time series decomposition in the upper reaches of Yangtze River, China , 2016, Environmental Earth Sciences.

[6]  Hui Lin,et al.  Estimating the Growing Stem Volume of Chinese Pine and Larch Plantations based on Fused Optical Data Using an Improved Variable Screening Method and Stacking Algorithm , 2020, Remote. Sens..

[7]  Rafał Weron,et al.  Beating the Naïve—Combining LASSO with Naïve Intraday Electricity Price Forecasts , 2020 .

[8]  Hugo Siqueira,et al.  Artificial Neural Networks to Estimate the Influence of Vehicular Emission Variables on Morbidity and Mortality in the Largest Metropolis in South America , 2020, Sustainability.

[9]  A. Younis,et al.  A review of the application of near-infrared spectroscopy to rare traditional Chinese medicine. , 2019, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[10]  Caston Sigauke,et al.  Probabilistic Hourly Load Forecasting Using Additive Quantile Regression Models , 2018, Energies.

[11]  Ashish Sharma,et al.  Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1 — A strategy for system predictor identification , 2000 .

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Seokho Kang,et al.  Effect of Irrelevant Variables on Faulty Wafer Detection in Semiconductor Manufacturing , 2019, Energies.

[14]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[15]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Milan Stojković,et al.  A New Approach for Trend Assessment of Annual Streamflows: a Case Study of Hydropower Plants in Serbia , 2017, Water Resources Management.

[18]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[19]  C. Reid,et al.  Variable selection methods for multiple regressions influence the parsimony of risk prediction models for cardiac surgery , 2017, The Journal of thoracic and cardiovascular surgery.

[20]  Kristian Hovde Liland,et al.  Comparison of variable selection methods in partial least squares regression , 2020, Journal of Chemometrics.

[21]  J. M. Damázio,et al.  The use of PAR(p) model in the stochastic dual dynamic programming optimization scheme used in the operation planning of the Brazilian hydropower system , 2005, 2004 International Conference on Probabilistic Methods Applied to Power Systems.

[22]  Nikolaos Kourentzes,et al.  Feature selection for time series prediction - A combined filter and wrapper approach for neural networks , 2010, Neurocomputing.

[23]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[24]  Carmelo J. A. Bastos Filho,et al.  Simplified binary cat swarm optimization , 2020, Integr. Comput. Aided Eng..

[25]  M. H. Quenouille Approximate Tests of Correlation in Time‐Series , 1949 .

[26]  A. I. McLeod DIAGNOSTIC CHECKING OF PERIODIC AUTOREGRESSION MODELS WITH APPLICATION , 1994 .

[27]  H. Akaike A new look at the statistical model identification , 1974 .

[28]  A. V. Vecchia MAXIMUM LIKELIHOOD ESTIMATION FOR PERIODIC AUTOREGRESSIVE MOVING AVERAGE MODELS. , 1985 .

[29]  G. Cavadias,et al.  Regionalization of low flows based on Canonical Correlation Analysis , 2011 .

[30]  Konrad Furmańczyk,et al.  Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification , 2020, Entropy.

[31]  Kwok-wing Chau,et al.  Data-driven input variable selection for rainfall-runoff modeling using binary-coded particle swarm optimization and Extreme Learning Machines , 2015 .

[32]  Michael I. Miller,et al.  A comparison of random forest variable selection methods for classification prediction modeling , 2019, Expert Syst. Appl..

[33]  B. Sartono,et al.  Variable selection methods applied to the mathematics scores of Indonesian students based on convex penalized likelihood , 2019, Journal of Physics: Conference Series.

[34]  Qiang Yang,et al.  Multi-Step Ahead Wind Power Generation Prediction Based on Hybrid Machine Learning Techniques , 2018, Energies.

[35]  Lijuan Xie,et al.  Technology using near infrared spectroscopic and multivariate analysis to determine the soluble solids content of citrus fruit , 2014 .

[36]  Holger R. Maier,et al.  Input determination for neural network models in water resources applications. Part 1—background and methodology , 2005 .

[37]  Carmelo J. A. Bastos Filho,et al.  A novel binary artificial bee colony algorithm , 2019, Future Gener. Comput. Syst..

[38]  Hugo Valadares Siqueira,et al.  Performance analysis of unorganized machines in streamflow forecasting of Brazilian plants , 2018, Appl. Soft Comput..

[39]  Hugo Valadares Siqueira,et al.  Echo State Networks in Seasonal Streamflow Series Prediction , 2012 .

[40]  Knut Alfredsen,et al.  Regional Statistical and Precipitation–Runoff Modelling for Ecological Applications: Prediction of Hourly Streamflow in Regulated Rivers and Ungauged Basins , 2017 .

[41]  Li Li,et al.  Maximum relevance minimum common redundancy feature selection for nonlinear data , 2017, Inf. Sci..

[42]  Jianbo Sun,et al.  Using Random Forests to Select Optimal Input Variables for Short-Term Wind Speed Forecasting Models , 2017 .

[43]  H. Loáiciga,et al.  Independent variable selection for regression modeling of the flow duration curve for ungauged basins in the United States , 2020 .

[44]  Pascal Côté,et al.  Analysis of the effects of biases in ensemble streamflow prediction (ESP) forecasts on electricity production in hydropower reservoir management , 2019, Hydrology and Earth System Sciences.

[45]  Marco S. Reis,et al.  An extended comparison study of large scale datadriven prediction methods based on variable selection, latent variables, penalized regression and machine learning , 2016 .

[46]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[47]  Rosangela Ballini,et al.  Top-down strategies based on adaptive fuzzy rule-based systems for daily time series forecasting , 2011 .

[48]  H. Siqueira,et al.  Neural Networks for Predicting Prices of Sugarcane Derivatives , 2018, Sugar Tech.

[49]  Bellie Sivakumar,et al.  Neural network river forecasting through baseflow separation and binary-coded swarm optimization , 2015 .

[50]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[51]  Changhong Liu,et al.  Discrimination and Measurements of Three Flavonols with Similar Structure Using Terahertz Spectroscopy and Chemometrics , 2018 .

[52]  Shotaro Akaho Conditionally independent component analysis for supervised feature extraction , 2002, Neurocomputing.

[53]  H. Seo Unified methods for variable selection and outlier detection in a linear regression , 2019 .

[54]  J. Dach,et al.  Energy value estimation of silages for substrate in biogas plants using an artificial neural network , 2020 .

[55]  R. Deo,et al.  Stream-flow forecasting using extreme learning machines: a case study in a semi-arid region in Iraq , 2016 .