C5.0 Algorithm and Synthetic Minority Oversampling Technique (SMOTE) for Rainfall Forecasting in Bandung Regency

Weather is an essential aspect of life because it can affect human activities. Therefore, it is important for weather prediction to have high accuracy. One of the methods used to predict rainfall is data mining. In this study, a classification model was developed using the C5.0 algorithm to forecast rainfall in Bandung Regency. Then, the SMOTE algorithm was used to overcome imbalanced datasets. Weather data for the model development were obtained from the Meteorological, Climatological, and Geophysical Agency (BMKG) of Bandung for the years 2005 until 2017. Subsequently, the model was validated using a k-fold cross-validation. The results of the C5.0 test produced the highest accuracy of 92% for the imbalance dataset, while the accuracy of the addition of data using the SMOTE technique was 99%.

[1]  A. K. Santra,et al.  Genetic Algorithm and Confusion Matrix for Document Clustering , 2012 .

[2]  Kaur Meteorological Data Mining Techniques : A Survey Gurbrinder , 2012 .

[3]  Nilima P. Patil,et al.  Comparison of C5.0 & CART Classification algorithms using pruning technique , 2012 .

[4]  Zahoor Jan,et al.  Seasonal to Inter-annual Climate Prediction Using Data Mining KNN Technique , 2008, IMTIC.

[5]  Dong Hyun Jeong,et al.  Designing a rule-based hourly rainfall prediction model , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[6]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[7]  Fhira Nhita,et al.  Weather Forecasting in Bandung Regency based on FP-Growth Algorithm , 2019 .

[8]  Ashok N. Srivastava,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2005, J. Comput. Inf. Sci. Eng..

[9]  Wei Liu,et al.  A Novel Improved SMOTE Resampling Algorithm Based on Fractal , 2011 .

[10]  Abbas Rohani,et al.  A novel soft computing model (Gaussian process regression with K-fold cross validation) for daily and monthly solar radiation forecasting (Part: I) , 2018 .

[11]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[12]  Adiwijaya,et al.  A rainfall forecasting using fuzzy system based on genetic algorithm , 2013, 2013 International Conference of Information and Communication Technology (ICoICT).

[13]  P. Kiruthika,et al.  An Overview of Classification Algorithm in Data mining , 2015 .

[14]  Rutvija Pandya,et al.  C5.0 Algorithm to Improved Decision Tree with Feature Selection and Reduced Error Pruning , 2015 .

[15]  Sulin Pang,et al.  C5.0 Classification Algorithm and Application on Individual Credit Evaluation of Banks , 2009 .

[16]  A. B. Adeyemo,et al.  Application of Data Mining Techniques in Weather Prediction and Climate Change Studies , 2012 .

[17]  Vidya Chitre,et al.  Customer Card Classification Based on C 5 . 0 & CART Algorithms * Prof , 2012 .

[19]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[20]  Yue-Shi Lee,et al.  Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset , 2006 .

[21]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[22]  Jens Myrup Pedersen,et al.  A method for classification of network traffic based on C5.0 Machine Learning Algorithm , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).