Ensembles of $({\alpha})$-Trees for Imbalanced Classification Problems

This paper introduces two kinds of decision tree ensembles for imbalanced classification problems, extensively utilizing properties of α-divergence. First, a novel splitting criterion based on α-divergence is shown to generalize several well-known splitting criteria such as those used in C4.5 and CART. When the α-divergence splitting criterion is applied to imbalanced data, one can obtain decision trees that tend to be less correlated (α-diversification) by varying the value of α. This increased diversity in an ensemble of such trees improves AUROC values across a range of minority class priors. The second ensemble uses the same alpha trees as base classifiers, but uses a lift-aware stopping criterion during tree growth. The resultant ensemble produces a set of interpretable rules that provide higher lift values for a given coverage, a property that is much desirable in applications such as direct marketing. Experimental results across many class-imbalanced data sets, including BRFSS, and MIMIC data sets from the medical community and several sets from UCI and KEEL are provided to highlight the effectiveness of the proposed ensembles over a wide range of data distributions and of class imbalance.

[1]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[2]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[3]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[4]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[5]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[6]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[7]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[8]  Thomas G. Dietterich,et al.  Applying the Waek Learning Framework to Understand and Improve C4.5 , 1996, ICML.

[9]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[10]  Nitesh V. Chawla,et al.  Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees , 2005, MLCW.

[11]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[16]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[17]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[18]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[19]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[20]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[21]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[22]  Szymon Wilk,et al.  Learning from Imbalanced Data in Presence of Noisy and Borderline Examples , 2010, RSCTC.

[23]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[24]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[25]  Huaiyu Zhu,et al.  Information geometric measurements of generalisation , 1995 .

[26]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[27]  Gary Weiss,et al.  Does cost-sensitive learning beat sampling for classifying rare classes? , 2005, UBDM '05.

[28]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[29]  Carey E. Priebe,et al.  Cross-Instance Tuning of Unsupervised Document Clustering Algorithms , 2007, HLT-NAACL.

[30]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[31]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[32]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[33]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[34]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[35]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[36]  C. Lee Giles,et al.  Active learning for class imbalance problem , 2007, SIGIR.

[37]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[38]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[39]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[40]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[42]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[43]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[44]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[45]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[46]  Leo Breiman,et al.  Technical note: Some properties of splitting criteria , 2004, Machine Learning.

[47]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .