Random Ordinality Ensembles: Ensemble methods for multi-valued categorical data

Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that reduces this problem, and provides significantly improved accuracies over current ensemble methods. We perform a random projection of the categorical data into a continuous space. As the transformation to continuous data is a random process, each dataset has a different imposed ordinality. A decision tree that learns on this new continuous space is able to use binary splits, hence reduces the data fragmentation problem. Generally, these binary trees are accurate. Diverse training datasets ensure diverse decision trees in the ensemble. We created two variants of the technique, ROE. In the first variant, we used decision trees as the base models for ensembles. In the second variant, we combined the attribute randomisation of Random Subspaces with Random Ordinality. These methods match or outperform other popular ensemble methods. Different properties of these ensembles were studied. The study suggests that random ordinality trees are generally more accurate and smaller than multi-way split decision trees. It is also shown that random ordinality attributes can be used to improve Bagging and AdaBoost.M1 ensemble methods.

[1]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[2]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[5]  Philip S. Yu,et al.  A general framework for accurate and fast regression by data summarization in random decision trees , 2006, KDD '06.

[6]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[8]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[9]  Lipika Dey,et al.  A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set , 2007, Pattern Recognit. Lett..

[10]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[11]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[12]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[13]  Robi Polikar,et al.  Learn$^{++}$ .NC: Combining Ensemble of Classifiers With Dynamically Weighted Consult-and-Vote for Efficient Incremental Learning of New Classes , 2009, IEEE Transactions on Neural Networks.

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[16]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Fabio Roli,et al.  A Theoretical Analysis of Bagging as a Linear Combination of Classifiers , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Namgil Lee,et al.  Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications , 2010, Comput. Stat. Data Anal..

[20]  Gavin Brown,et al.  Random Ordinality Ensembles A Novel Ensemble Method for Multi-valued Categorical Data , 2009, MCS.

[21]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[22]  Usama M. Fayyad,et al.  The Attribute Selection Problem in Decision Tree Generation , 1992, AAAI.

[23]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[24]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[25]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[26]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[27]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[30]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[31]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[32]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[33]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[34]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[35]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[36]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[37]  Larry A. Rendell,et al.  Global Data Analysis and the Fragmentation Problem in Decision Tree Induction , 1997, ECML.

[38]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[39]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[41]  Boleslaw K. Szymanski,et al.  Learning Dissimilarities for Categorical Symbols , 2010, FSDM.

[42]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[43]  Fabio Roli,et al.  Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation , 2005, Multiple Classifier Systems.

[44]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[45]  Ali Hamzeh,et al.  CBDL: Context‐based distance learning for categorical attributes , 2011, Int. J. Intell. Syst..

[46]  Xin Xu,et al.  Text Categorization Using SVMs with Rocchio Ensemble for Internet Information Classification , 2005, ICCNMC.

[47]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[48]  Philip S. Yu,et al.  Is random model better? On its accuracy and efficiency , 2003, Third IEEE International Conference on Data Mining.