Using multiple measures to predict confidence in instance classification

Selecting an effective method for combining the votes of classifiers in an ensemble can have a significant impact on the ensemble's overall classification accuracy. Some methods cannot even achieve as high a classification accuracy as the most accurate individual classifying component. To address this issue, we present the strategy of Aggregate Confidence Ensembles, which uses multiple measures to estimate a classifier's confidence in its predictions on an instance-by-instance basis. Using these confidence estimators to weight the votes in an ensemble results in an overall average increase in classification accuracy compared to the most accurate classifier in the ensemble. These aggregate measures result in higher classification accuracy than using a collection of single confidence estimates. Aggregate Confidence Ensembles outperform three baseline ensemble creation strategies, as well as the methods of Modified Stacking and Arbitration, both in terms of average classification accuracy and algorithm-by-algorithm comparisons in accuracy over 36 data sets.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[3]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[4]  Fabio Roli,et al.  A theoretical framework for dynamic classifier selection , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[5]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[6]  Peter A. Flach,et al.  Delegating classifiers , 2004, ICML.

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[10]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[11]  Bogdan Gabrys,et al.  Analysis of the Correlation Between Majority Voting Error and the Diversity Measures in Multiple Classifier Systems , 2001 .

[12]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[13]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[14]  Ian Witten,et al.  Data Mining , 2000 .

[15]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[18]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  I. Kononenko,et al.  INDUCTION OF DECISION TREES USING RELIEFF , 1995 .

[21]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[22]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[23]  Bogdan Gabrys,et al.  Classifier selection for majority voting , 2005, Inf. Fusion.

[24]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[25]  Shlomo Argamon,et al.  Arbitrating Among Competing Classifiers Using Learned Referees , 2001, Knowledge and Information Systems.

[26]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[27]  T. Martinez,et al.  Estimating The Potential for Combining Learning Models , 2005 .

[28]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .