论文信息 - Random Prism: An Alternative to Random Forests

Random Prism: An Alternative to Random Forests

Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting.

Max Bramer | Frederic T. Stahl | M. Bramer

[1] Max Bramer,et al. Inducer: a public domain workbench for data mining , 2005, Int. J. Syst. Sci..

[2] Mo Adda,et al. Parallel Induction of Modular Classification Rules , 2008, SGAI Conf..

[3] Philip S. Yu,et al. Distributed hoeffding trees for pocket data mining , 2011, 2011 International Conference on High Performance Computing & Simulation.

[4] Padhraic Smyth,et al. An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[5] Philip S. Yu,et al. Distributed Classification for Pocket Data Mining , 2011, ISMIS.

[6] Max Bramer,et al. Induction of Modular Classification Rules: Using Jmax-pruning , 2010, SGAI Conf..

[7] Frans Coenen,et al. Research and Development in Intelligent Systems XVI , 2000, Springer London.

[8] Geoff Hulten,et al. Mining high-speed data streams , 2000, KDD '00.

[9] Salvatore J. Stolfo,et al. Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.

[10] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[11] Bernard Zenko,et al. Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[12] Max Bramer,et al. An Information-Theoretic Approach to the Pre-pruning of Classification Rules , 2002, Intelligent Information Processing.

[13] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[15] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.