Different Ways of Weakening Decision Trees and Their Impact on Classification Accuracy of DT Combination

Recent classifier combination frameworks have proposed several ways of weakening a learning set and have shown that these weakening methods improve prediction accuracy. In the present paper we focus on learning set sampling (Breiman's bagging) and random feature subset selections (Bay's Multiple Feature Subsets). We present a combination scheme labeled 'Bagfs', in which new learning sets are generated on the basis of both bootstrap replicates and selected feature subsets. The performances of the three methods (Bagging, MFS and Bagfs) are assessed by means of a decision-tree inducer (C4.5) and a majority voting rule. In addition, we also study whether the way in which weak classifiers are created has a significant influence on the performance of their combination. To answer this question, we undertook the strict application of the Cochran Q test. This test enabled us to compare the three weakening methods together on a given database, and to conclude whether or not these methods differ significantly. We also used the McNemar test to compare algorithms pair by pair. The first results, obtained on 14 conventional databases, show that on average, Bagfs exhibits the best agreement between prediction and supervision. The Cochran Q test indicated that the weak classifiers so created significantly influenced combination performance in the case of at least 4 of the 14 databases analyzed.

[1]  G. H. Rosenfield,et al.  A coefficient of agreement as a measure of thematic classification accuracy. , 1986 .

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Bernard R. Rosner,et al.  Fundamentals of Biostatistics. , 1992 .

[4]  Roberto Battiti,et al.  Democracy in neural nets: Voting schemes for classification , 1994, Neural Networks.

[5]  Stephen D. Bay Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets , 1998, ICML.

[6]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[7]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Zijian Zheng,et al.  Generating Classifier Commitees by Stochastically Selecting both Attributes and Training Examples , 1998, PRICAI.

[11]  Ron Kohavi,et al.  Option Decision Trees with Majority Votes , 1997, ICML.

[12]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[13]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[14]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[15]  Ching Y. Suen,et al.  Application of majority voting to pattern recognition: an analysis of its behavior and performance , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[16]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[17]  Ching Y. Suen,et al.  A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals , 1995, IEEE Trans. Pattern Anal. Mach. Intell..