Imbalanced classification techniques for monsoon forecasting based on a new climatic time series

Abstract Monsoons have been widely studied in the literature due to their climatic impact related to precipitation and temperature over different regions around the world. In this work, data mining techniques, namely imbalanced classification techniques, are proposed in order to check the capability of climate indices to capture and forecast the evolution of the Western North Pacific Summer Monsoon. Thus, the main goal is to predict if the monsoon will be an extreme monsoon for a temporal horizon of a month. Firstly, a new monthly index of the monsoon related to its intensity has been generated. Later, the problem of forecasting has been transformed into a binary imbalanced classification problem and a set of representative techniques, such as models based on trees, models based on rules, black box models and ensemble techniques, are applied to obtain the forecasts. From the results obtained, it can be concluded that the methodology proposed here reports promising results according to the quality measures evaluated and predicts extreme monsoons for a temporal horizon of a month with a high accuracy.

[1]  Bhogeswar Borah,et al.  Indian summer monsoon rainfall prediction using artificial neural network , 2013, Stochastic Environmental Research and Risk Assessment.

[2]  R. Nanjundiah,et al.  Autoencoder-based identification of predictors of Indian monsoon , 2016, Meteorology and Atmospheric Physics.

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  K. C. Tripathi,et al.  Prediction of Indian summer monsoon rainfall using Niño indices: A neural network approach , 2011 .

[7]  C. Peña-Ortiz,et al.  An instrumental index of the West African Monsoon back to the nineteenth century , 2015 .

[8]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[9]  K. Lau,et al.  Interannual Variability of the Asian Summer Monsoon: Contrasts between the Indian and the Western North Pacific–East Asian Monsoons* , 2001 .

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Swadhin K. Behera,et al.  El Niño Modoki and its possible teleconnection , 2007 .

[12]  Jun Matsumoto,et al.  Summer Monsoon over the Asian Continent and Western North Pacific , 1994 .

[13]  Neville Nicholls,et al.  A further extension of the Tahiti-Darwin SOI, early ENSO events and Darwin pressure , 1991 .

[14]  Surajit Chattopadhyay,et al.  Elucidating the role of topological pattern discovery and support vector machine in generating predictive models for Indian summer monsoon rainfall , 2016, Theoretical and Applied Climatology.

[15]  Surajit Chattopadhyay,et al.  A neurocomputing approach to predict monsoon rainfall in monthly scale using SST anomaly as a predictor , 2012, Acta Geophysica.

[16]  J. Ramos,et al.  Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques , 2007, IEEE Transactions on Power Systems.

[17]  Eyke Hüllermeier,et al.  FURIA: an algorithm for unordered fuzzy rule induction , 2009, Data Mining and Knowledge Discovery.

[18]  Sutapa Chaudhuri,et al.  Meta-heuristic ant colony optimization technique to forecast the amount of summer monsoon rainfall: skill comparison with Markov chain model , 2014, Theoretical and Applied Climatology.

[19]  Olatz Arbelaitz,et al.  Coverage-based resampling: Building robust consolidated decision trees , 2015, Knowl. Based Syst..

[20]  Rajib Maity,et al.  Prediction of monthly rainfall on homogeneous monsoon regions of India based on large scale circulation patterns using Genetic Programming , 2012 .

[21]  Wen Chen,et al.  Influence of the IOD on the relationship between El Niño Modoki and the East Asian‐western North Pacific summer monsoon , 2014 .

[22]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[23]  Toshio Yamagata,et al.  Anomalous summer climate in China influenced by the tropical Indo-Pacific Oceans , 2011 .

[24]  Alex J. Cannon,et al.  A graphical sensitivity analysis for statistical climate models: application to Indian monsoon rainfall prediction by artificial neural networks and multiple linear regression models , 2002 .

[25]  F. Jin,et al.  Impact of different El Niño types on the El Niño/IOD relationship , 2015 .

[26]  S. Tang,et al.  The generation mechanism of synthetic minority class examples , 2008, 2008 International Conference on Information Technology and Applications in Biomedicine.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[29]  J. Wallace,et al.  A Pacific Interdecadal Climate Oscillation with Impacts on Salmon Production , 1997 .

[30]  Shashidhar G. Koolagudi,et al.  Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches , 2017, Comput. Geosci..

[31]  C. Peña-Ortiz,et al.  Reconstructing the Western North Pacific Summer Monsoon since the Late Nineteenth Century , 2018 .

[32]  Szymon Wilk,et al.  Selective Pre-processing of Imbalanced Data for Improving Classification Performance , 2008, DaWaK.

[33]  Pabitra Mitra,et al.  Co-Clustering Based Approach for Indian Monsoon Prediction , 2015, ICCS.

[34]  Elizabeth C. Kent,et al.  Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century , 2003 .

[35]  B. Goswami,et al.  A dipole mode in the tropical Indian Ocean , 1999, Nature.

[36]  Shashidhar G. Koolagudi,et al.  Closed Item-Set Mining for Prediction of Indian Summer Monsoon Rainfall A Data Mining Model with Land and Ocean Variables as Predictors , 2015 .

[37]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[38]  C. T. Dhanya,et al.  Data Mining for Evolving Fuzzy Association Rules for Predicting Monsoon Rainfall of India , 2009 .

[39]  S. Yeh,et al.  Influence of the Pacific Decadal Oscillation on the Relationship between El Niño and the Northeast Asian Summer Monsoon , 2010 .

[40]  C. Peña-Ortiz,et al.  Tracking the Indian Summer Monsoon Onset Back to the Preinstrument Period , 2016 .

[41]  Miguel J. Prieto,et al.  Menéndez, R.P.; Martínez, J.A.; Prieto, M.J.; Barcia, L.A.; Sánchez, J.M.M. A Novel Modeling of Molten-Salt Heat Storage Systems in Thermal Solar Power Plants. Energies 2014, 7, 6721-6740 , 2015 .

[42]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[43]  R. Lu,et al.  Asymmetric Relationship between Indian Ocean SST and the Western North Pacific Summer Monsoon , 2015 .

[44]  Wen Zhou,et al.  PDO, ENSO and the early summer monsoon rainfall over south China , 2005 .

[45]  Francisco Martínez-Álvarez,et al.  A Survey on Data Mining Techniques Applied to Electricity-Related Time Series Forecasting , 2015 .

[46]  J. Jhun,et al.  Interdecadal changes in interannual variability of the global monsoon precipitation and interrelationships among its subcomponents , 2014, Climate Dynamics.