Klasifikasi Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing

The world of banking requires a marketer to be able to reduce the risk of borrowing by keeping his customers from occurring non-performing loans. One way to reduce this risk is by using data mining techniques. Data mining provides a powerful technique for finding meaningful and useful information from large amounts of data by way of classification. The classification algorithm that can be used to handle imbalance problems can use the Random Forest (RF) algorithm. However, several references state that an optimization algorithm is needed to improve the classification results of the RF algorithm. Optimization of the RF algorithm can be done using Bagging and Genetic Algorithm (GA). This study aims to classify Bank Marketing data in the form of loan application receipts, which data is taken from the www.data.world site. Classification is carried out using the RF algorithm to obtain a predictive model for loan application acceptance with optimal accuracy. This study will also compare the use of optimization in the RF algorithm with Bagging and Genetic Algorithms. Based on the tests that have been done, the results show that the most optimal performance of the classification of Bank Marketing data is by using the RF algorithm with an accuracy of 88.30%, AUC (+) of 0.500 and AUC (-) of 0.000. The optimization of Bagging and Genetic Algorithm has not been able to improve the performance of the RF algorithm for classification of Bank Marketing data.

[1]  S. Umadevi,et al.  A survey on data mining classification algorithms , 2017, 2017 International Conference on Signal Processing and Communication (ICSPC).

[2]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[3]  B. Pradhan,et al.  A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility , 2017 .

[4]  Lukmanul Hakim,et al.  Bagging Based Ensemble Classification Method on Imbalance Datasets , 2017 .

[5]  A. Arfiani,et al.  Ovarian cancer data classification using bagging and random forest , 2019, PROCEEDINGS OF THE 4TH INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES (ISCPMS2018).

[6]  Kenli Li,et al.  A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment , 2017, IEEE Transactions on Parallel and Distributed Systems.

[7]  Josef Pauli,et al.  Understanding the Interplay of Simultaneous Model Selection and Representation Optimization for Classification Tasks , 2016, ICPRAM.

[8]  Ashima Malik A Study of Genetic Algorithm and Crossover Techniques , 2019 .

[9]  Ahmad Afif Supianto,et al.  Hyper Parameter Optimization using Genetic Algorithm on Machine Learning Methods for Online News Popularity Prediction , 2018 .

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Aleena Ahmad,et al.  Hybrid of Filters and Genetic Algorithm - Random Forests Based Wrapper Approach for Feature Selection and Prediction , 2019 .

[12]  Seyed Amir Naghibi,et al.  Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping , 2017, Water Resources Management.

[13]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[14]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  P. O. Odion,et al.  Effective and Accurate Bootstrap Aggregating (Bagging) Ensemble Algorithm Model for Prediction and Classification of Hypothyroid Disease , 2020 .

[16]  Vikas Chaurasia Data Mining Approach to Detect Heart Dieses , 2013 .

[17]  Cheolhee Yoo,et al.  Comparison between convolutional neural networks and random forest for local climate zone classification in mega urban areas using Landsat images , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Matías Gámez,et al.  adabag: An R Package for Classification with Boosting and Bagging , 2013 .

[20]  Thien-My Dao,et al.  Optimization of Obsolescence Forecasting Using New Hybrid Approach Based on the RF Method and the Meta-heuristic Genetic Algorithm , 2018, American Journal of Management.

[21]  Vatsal Patel,et al.  A Review on Random Forest: An Ensemble Classifier , 2018, International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018.

[22]  A. Pedro Duarte Silva,et al.  Optimization approaches to Supervised Classification , 2017, Eur. J. Oper. Res..

[23]  Muhammad Saifi,et al.  ANALISIS KEPUTUSAN PEMBERIAN KREDIT DALAM LANGKAH MEMINIMALISIR KREDIT BERMASALAH (Studi kasus pada Kredit Umum PT. Bank Rakyat Indonesia (persero) Tbk unit Slawi 1, Kab Tegal Jawa tengah) , 2016 .

[24]  Tuo Shi,et al.  Random Forest Algorithm Based on Genetic Algorithm Optimization for Property-Related Crime Prediction , 2019, Proceedings of the 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019).

[25]  Bin Yu,et al.  Accurate prediction of potential druggable proteins based on genetic algorithm and Bagging-SVM ensemble classifier , 2019, Artif. Intell. Medicine.

[27]  Justin Zhijun Zhan,et al.  Data mining in distributed environment: a survey , 2017, WIREs Data Mining Knowl. Discov..

[28]  Ehsanollah Habibi,et al.  Optimization of the ANFIS using a genetic algorithm for physical work rate classification , 2018, International journal of occupational safety and ergonomics : JOSE.

[29]  Dipti P. Rana,et al.  Review of random forest classification techniques to resolve data imbalance , 2017, 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM).

[30]  Mohamed Medhat Gaber,et al.  A genetic algorithm approach to optimising random forests applied to class engineered data , 2017, Inf. Sci..

[31]  Shahrokh Asadi,et al.  Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization , 2020, Eng. Appl. Artif. Intell..