Performance of ACO-based Decision Tree Algorithm with Imbalanced Class Data Sets - A Heuristic Approach

The prediction of minority class can be like finding a needle in a haystack. Bio-inspired classifier such as Ant Colony Optimization (ACO) decision tree experienced ineffective decision boundaries since its entropy-based heuristic is affected by the strong presence of the dominant class. Consequently, the developed trees were dominated by the likelihood of the majority class where the rare class is under-represented. The proposed algorithm with class skew-insensitive heuristic namely the Hellinger-Ant-Tree-Miner (HATM) was compared to the Ant-Tree-Miner (ATM), via a simulation study and application to 15 imbalanced data. Simulation results revealed the advantage of HATM over the ATM under skewed class distributions as the number of covariates and sample sizes increase. Experiments with real data indicate a potential improvement of the ATM measured by balanced accuracy (BACC), F-Measure and minority class prediction (MCP). The Friedman tests justify that HATM performed better than ATM while being competitive with other well-known tree-based classifiers.

[1]  Eswaran Perumal,et al.  Efficient classification of chronic kidney disease by using multi-kernel support vector machine and fruit fly optimization algorithm , 2020, Int. J. Imaging Syst. Technol..

[2]  Sun Hur,et al.  A Membership Probability–Based Undersampling Algorithm for Imbalanced Data , 2020, J. Classif..

[3]  P. Parthiban,et al.  A hybrid metaheuristics approach for a multi-depot vehicle routing problem with simultaneous deliveries and pickups , 2019, Int. J. Math. Oper. Res..

[4]  Amin Ahsan Ali,et al.  Inter-node Hellinger Distance based Decision Tree , 2019, IJCAI.

[5]  Sohail Asghar,et al.  A Classification Model For Class Imbalance Dataset Using Genetic Programming , 2019, IEEE Access.

[6]  Rd. Rohmat Saedudin,et al.  Optimized bio-inspired kernels with twin support vector machine using low identity sequences to solve imbalance multiclass classification , 2019, Journal of Environmental Biology.

[7]  Noviyanti Santoso,et al.  Integration of synthetic minority oversampling technique for imbalanced class , 2019, Indonesian Journal of Electrical Engineering and Computer Science.

[8]  Anna Saro Vijendran,et al.  Adaptive Data Structure Based Oversampling Algorithm for Ordinal Classification , 2018, Indonesian Journal of Electrical Engineering and Computer Science.

[9]  Janmenjoy Nayak,et al.  A Novel Honey-Bees Mating Optimization Approach with Higher order Neural Network for Classification , 2018, J. Classif..

[10]  Ashraf Darwish,et al.  A New Chaotic Whale Optimization Algorithm for Features Selection , 2018, Journal of Classification.

[11]  Rajat Kumar Pal,et al.  A modified ant colony optimisation based approach to solve sub-tour constant travelling salesman problem , 2017, Int. J. Math. Oper. Res..

[12]  Haydemar Núñez,et al.  Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias , 2017, J. Classif..

[13]  María José del Jesús,et al.  KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining , 2017, Int. J. Comput. Intell. Syst..

[14]  Othman Omran Khalifa,et al.  Applied information theory and coding , 2016 .

[15]  Ku Ruhana Ku-Mahamud,et al.  Fuzzy distance-based undersampling technique for imbalanced flood data , 2016 .

[16]  Jing Bian,et al.  An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem , 2016 .

[17]  Regina Berretta,et al.  Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification , 2016, PloS one.

[18]  Jerzy Stefanowski,et al.  Addressing imbalanced data with argument based rule learning , 2015, Expert Syst. Appl..

[19]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Ku Ruhana Ku-Mahamud,et al.  Fuzzy and smote resampling technique for imbalanced data sets , 2015 .

[21]  Ku Ruhana Ku-Mahamud,et al.  A conceptual model of enhanced undersampling technique , 2014 .

[22]  Jing Zhao,et al.  ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data , 2013, Neurocomputing.

[23]  Alex Alves Freitas,et al.  Inducing decision trees with an ant colony optimization algorithm , 2012, Appl. Soft Comput..

[24]  Ku Ruhana Ku-Mahamud,et al.  Hybrid Ant Colony Optimization and Simulated Annealing for Rule Induction , 2011, 2011 UKSim 5th European Symposium on Computer Modeling and Simulation.

[25]  A. Basu,et al.  Statistical Inference: The Minimum Distance Approach , 2011 .

[26]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[27]  Urszula Boryczka,et al.  Ant Colony Decision Trees - A New Method for Constructing Decision Trees Based on Ant Colony Optimization , 2010, ICCCI.

[28]  T.M. Padmaja,et al.  Majority filter-based minority prediction (MFMP): An approach for unbalanced datasets , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[29]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[30]  Claire Cardie,et al.  Improving Minority Class Prediction Using Case-Specific Feature Weights , 1997, ICML.

[31]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[32]  Rizauddin Saian,et al.  An improved ACO-based decision tree algorithm for imbalanced datasets , 2021, Int. J. Math. Model. Numer. Optimisation.

[33]  Ku Ruhana Ku-Mahamud,et al.  A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets , 2021 .

[34]  David A. Cieslak,et al.  Hellinger distance decision trees are robust and skew-insensitive , 2011, Data Mining and Knowledge Discovery.

[35]  B. Krawczyk,et al.  Improving minority class prediction using cost-sensitive ensembles , 2011 .

[36]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[37]  Alex A. Freitas,et al.  Ant Colony Algorithms for Data Classification , 2009 .

[38]  L. Breiman Random Forests , 2001, Machine Learning.