Application of penalized linear regression and ensemble methods for drought forecasting in Northeast China

Effective drought prediction can be conducive to mitigating some of the effects of drought. Machine learning algorithms are increasingly used for developing drought prediction models due to their high efficiency and accuracy. This study explored the ability of several machine learning models based on penalized linear regression and decision tree (DT)-based ensemble methods to predict drought conditions represented by the Standardized Precipitation–Evapotranspiration Index (SPEI) in Northeast China. We compared the forecasting performance of the penalized linear regression models based on ridge regression (RR) and lasso regression (LR) with the ordinary least squares (OLS) regression model. In addition, the AdaBoost and Random Forests (RF) models were also used to explore the suitability of ensemble methods for improving the forecasting performance. The SPEI was forecast at the different timescales of 3, 6, 12, and 24 months using the aforementioned machine learning models and the indices were used to predict short-term and long-term drought conditions. The prediction results indicated that the penalized linear regression models provided better prediction results and the ensemble methods consistently outperformed the DT model. Overall, the LR models were the optimum models for forecasting the SPEI at different timescales in Northeast China.

[1]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[2]  T. McKee,et al.  THE RELATIONSHIP OF DROUGHT FREQUENCY AND DURATION TO TIME SCALES , 1993 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5]  N. Guttman COMPARING THE PALMER DROUGHT INDEX AND THE STANDARDIZED PRECIPITATION INDEX 1 , 1998 .

[6]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[7]  D. Wilhite Drought as a natural hazard : Concepts and definitions , 2000 .

[8]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[11]  S. Goddard,et al.  A Self-Calibrating Palmer Drought Severity Index , 2004 .

[12]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[13]  M. McKee,et al.  SOIL MOISTURE PREDICTION USING SUPPORT VECTOR MACHINES 1 , 2006 .

[14]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[15]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[16]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[17]  S. Vicente‐Serrano,et al.  A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index , 2009 .

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  R. Sadiq,et al.  A review of drought indices , 2011 .

[20]  A. Dai Drought under global warming: a review , 2011 .

[21]  Justin T. Maxwell,et al.  Ocean–Atmosphere Influences on Low-Frequency Warm-Season Drought Variability in the Gulf Coast and Southeastern United States , 2011 .

[22]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[23]  W. Dong,et al.  Responses of grassland and forest to temperature and precipitation changes in Northeast China , 2012, Advances in Atmospheric Sciences.

[24]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[25]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[26]  Dan Zhang,et al.  Spatial and temporal analysis of drought risk during the crop-growing season over northeast China , 2014, Natural Hazards.

[27]  S. Dech,et al.  The relationship between precipitation anomalies and satellite-derived vegetation activity in Central Asia , 2013 .

[28]  Brett Lantz,et al.  Machine learning with R : learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications , 2013 .

[29]  M. Janga Reddy,et al.  Ensemble prediction of regional droughts using climate inputs and the SVM–copula approach , 2014 .

[30]  M. Gocić,et al.  Drought Characterisation Based on Water Surplus Variability Index , 2014, Water Resources Management.

[31]  R. Seager,et al.  Global warming and 21st century drying , 2014, Climate Dynamics.

[32]  S. Vicente‐Serrano,et al.  Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and drought monitoring , 2014 .

[33]  Q. Ge,et al.  Prolonged dry episodes over Northeast China during the period 1961–2012 , 2015, Theoretical and Applied Climatology.

[34]  Xuyong Li,et al.  Spatial and temporal characteristics of droughts in the Northeast China Transect , 2015, Natural Hazards.

[35]  Ravinesh C. Deo,et al.  Application of the Artificial Neural Network model for prediction of monthly Standardized Precipitation and Evapotranspiration Index using hydrometeorological parameters and climate indices in eastern Australia , 2015 .

[36]  H. Ahammer,et al.  Improvements on coronal hole detection in SDO/AIA images using supervised classification , 2015, 1506.06623.

[37]  Shih-Chieh Kao,et al.  A multi-model and multi-index evaluation of drought characteristics in the 21st century , 2015 .

[38]  Zhanqing Li,et al.  RESPONSES OF VEGETATION GROWTH TO CLIMATE CHANGE IN CHINA , 2015 .

[39]  M. Gocić,et al.  Water Surplus Variability Index as an Indicator of Drought , 2015 .

[40]  K. Stahl,et al.  A quantitative analysis to objectively appraise drought indicators and model drought impacts , 2015 .

[41]  Sergio M. Vicente-Serrano,et al.  Contribution of precipitation and reference evapotranspiration to drought indices under different climates , 2015 .

[42]  J. Im,et al.  Drought assessment and monitoring through blending of multi-sensor indices using machine learning approaches for different climate regions , 2016 .

[43]  Jan Adamowski,et al.  Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction , 2016 .

[44]  Dragan Pamucar,et al.  Planning the City Logistics Terminal Location by Applying the Green p-Median Model and Type-2 Neurofuzzy Network , 2016, Comput. Intell. Neurosci..

[45]  J. Olesen,et al.  Adapting maize production to drought in the Northeast Farming Region of China , 2016 .

[46]  Jakub Nowotarski,et al.  Automated Variable Selection and Shrinkage for Day-Ahead Electricity Price Forecasting , 2016 .

[47]  Arash Malekian,et al.  Multi-time-scale analysis of hydrological drought forecasting using support vector regression (SVR) and artificial neural networks (ANN) , 2016, Arabian Journal of Geosciences.

[48]  Mário Basto,et al.  The Logistic Lasso and Ridge Regression in Predicting Corporate Failure , 2016 .

[49]  J. Botai,et al.  Characteristics of Droughts in South Africa: A Case Study of Free State and North West Provinces , 2016 .

[50]  Petr Máca,et al.  Forecasting SPEI and SPI Drought Indices Using the Integrated Artificial Neural Networks , 2015, Comput. Intell. Neurosci..

[51]  Wei Chen,et al.  Assessment of Drought Impact on Main Cereal Crops Using a Standardized Precipitation Evapotranspiration Index in Liaoning Province, China , 2016 .

[52]  K. Stahl,et al.  Developing drought impact functions for drought risk management , 2017 .

[53]  Xiaodan Lv,et al.  Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications , 2017, Biomedical engineering online.

[54]  J. Im,et al.  Meteorological drought forecasting for ungauged areas based on machine learning: Using long-range climate forecast and remote sensing data , 2017 .

[55]  Sepideh Karimi,et al.  Generalizability of gene expression programming and random forest methodologies in estimating cropland and grassland leaf area index , 2018, Comput. Electron. Agric..

[56]  Gang Liu,et al.  Remote Sensing of Coral Bleaching Using Temperature and Light: Progress towards an Operational Algorithm , 2017, Remote. Sens..

[57]  O. Kisi,et al.  Forecasting daily streamflow values: assessing heuristic models , 2018 .

[58]  Jungho Im,et al.  Prediction of Drought on Pentad Scale Using Remote Sensing Data and MJO Index through Random Forest over East Asia , 2018, Remote. Sens..

[59]  Muhammad Faisal,et al.  Forecasting Drought Using Multilayer Perceptron Artificial Neural Network Model , 2017, ArXiv.

[60]  Vahid Nourani,et al.  Comparative evaluation of intelligent algorithms to improve adaptive neuro-fuzzy inference system performance in precipitation modelling , 2019, Journal of Hydrology.