Oblique Decision Tree Algorithm with Minority Condensation for Class Imbalanced Problem

In recent years, a significant issue in classification is to handle a dataset containing imbalanced number of instances in each class. Classifier modification is one of the well-known techniques to deal with this particular issue. In this paper, the effective classification model based on an oblique decision tree is enhanced to work with an imbalanced dataset that is called oblique minority condensed decision tree (OMCT). Initially, it selects the best axis-parallel hyperplane based on the decision tree algorithm using the minority entropy of instances within the minority inner fence selection. Then it perturbs this hyperplane along each axis to improve its minority entropy. Finally, it stochastically perturbs this hyperplane to escape the local solution. From the experimental results, OMCT significantly outperforms six state-of-the-art decision tree algorithms that are CART, C4.5, OC1, AE, DCSM and ME on 18 real-world datasets from UCI in term of precision, recall and F1 score. Moreover, the size of a decision tree from OMCT is significantly smaller than others.

[1]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[2]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[3]  B. Margolin,et al.  An Analysis of Variance for Categorical Data , 1971 .

[4]  Ravi Kothari,et al.  A new node splitting measure for decision tree construction , 2010, Pattern Recognit..

[5]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[7]  Chih-Cheng Hung,et al.  Multi-objective evolution of oblique decision trees for imbalanced data binary classification , 2019, Swarm Evol. Comput..

[8]  Daoud Clarke,et al.  On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data , 2012, Decis. Support Syst..

[9]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[11]  C. J. Price,et al.  HHCART: An oblique decision tree , 2015, Comput. Stat. Data Anal..

[12]  S. Jaiyen,et al.  A New Incremental Decision Tree Learning for Cyber Security based on ILDA and Mahalanobis Distance , 2019, Engineering Journal.

[13]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[14]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[15]  Vaishali Ganganwar,et al.  An overview of classification algorithms for imbalanced datasets , 2012 .

[16]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[17]  C. J. Price,et al.  CARTopt: a random search method for nonsmooth unconstrained optimization , 2013, Comput. Optim. Appl..

[18]  Mikel Galar,et al.  Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy , 2016, Appl. Soft Comput..

[19]  Chumphol Bunkhumpornpat,et al.  DBMUTE: density-based majority under-sampling technique , 2017, Knowledge and Information Systems.

[20]  Krung Sinapiromsaran,et al.  Decision tree induction based on minority entropy for the class imbalance problem , 2017, Pattern Analysis and Applications.

[21]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[22]  Kuk-Jin Yoon,et al.  Polyp Detection via Imbalanced Learning and Discriminative Feature Learning , 2015, IEEE Transactions on Medical Imaging.

[23]  Witold Pedrycz,et al.  Data Mining: A Knowledge Discovery Approach , 2007 .

[24]  Shamik Sural,et al.  Credit card fraud detection: A fusion approach using Dempster-Shafer theory and Bayesian learning , 2009, Inf. Fusion.

[25]  Xiangxiang Zeng,et al.  nDNA-prot: identification of DNA-binding proteins based on unbalanced classification , 2014, BMC Bioinformatics.

[26]  David George Heath,et al.  A geometric framework for machine learning , 1993 .

[27]  Seyed Mostafa Pourhashemi E-mail Spam Filtering by A New Hybrid Feature Selection Method Using Chi2 as Filter and Random Tree as Wrapper , 2014 .

[28]  Farid García,et al.  Fisher's decision tree , 2013, Expert Syst. Appl..

[29]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[30]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[31]  Mehmet Fatih Amasyali,et al.  Cline: A New Decision-Tree Family , 2008, IEEE Transactions on Neural Networks.

[32]  Thanh-Nghi Do,et al.  A Comparison of Different Off-Centered Entropies to Deal with Class Imbalance for Decision Trees , 2008, PAKDD.

[33]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[34]  Gilbert Ritschard,et al.  An asymmetric entropy measure for decision trees , 2006 .

[35]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[36]  Chandrika Kamath,et al.  Inducing oblique decision trees with evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[37]  Longbing Cao,et al.  Effective detection of sophisticated online banking fraud on extremely imbalanced data , 2012, World Wide Web.

[38]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..