Optimal Classifier Fusion in a Non-Bayesian Probabilistic Framework

The combination of the output of classifiers has been one of the strategies used to improve classification rates in general purpose classification systems. Some of the most common approaches can be explained using the Bayes' formula. In this paper, we tackle the problem of the combination of classifiers using a non-Bayesian probabilistic framework. This approach permits us to derive two linear combination rules that minimize misclassification rates under some constraints on the distribution of classifiers. In order to show the validity of this approach we have compared it with other popular combination rules from a theoretical viewpoint using a synthetic data set, and experimentally using two standard databases: the MNIST handwritten digit database and the GREC symbol database. Results on the synthetic data set show the validity of the theoretical approach. Indeed, results on real data show that the proposed methods outperform other common combination schemes.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[3]  Bernhard Schölkopf,et al.  A tutorial on v-support vector machines , 2005 .

[4]  Theodosios Pavlidis,et al.  A review of algorithms for shape analysis , 1978 .

[5]  Josef Kittler,et al.  Experimental evaluation of expert fusion strategies , 1999, Pattern Recognit. Lett..

[6]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Guojun Lu,et al.  Review of shape representation and description techniques , 2004, Pattern Recognit..

[10]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[13]  Robert P. W. Duin,et al.  Handwritten digit recognition by combined classifiers , 1998, Kybernetika.

[14]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[16]  Ofer Melnik,et al.  Mixed group ranks: preference and confidence in classifier combination , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Josef Kittler,et al.  A Framework for Classifier Fusion: Is It Still Needed? , 2000, SSPR/SPR.

[18]  Ernest Valveny,et al.  Symbol Recognition: Current Advances and Perspectives , 2001, GREC.

[19]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[21]  Robert P. W. Duin,et al.  Is independence good for combining classifiers? , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[22]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yasufumi Takama,et al.  Mathematical aggregation operators in image retrieval: effect on retrieval performance and role in relevance feedback , 2005, Signal Process..

[24]  Sergio Escalera,et al.  Forest Extension of Error Correcting Output Codes and Boosted Landmarks , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[25]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[26]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[27]  T. Ho A theory of multiple classifier systems and its application to visual word recognition , 1992 .

[28]  Laurent Wendling,et al.  A new shape descriptor defined on the Radon transform , 2006, Comput. Vis. Image Underst..

[29]  Sven Loncaric,et al.  A survey of shape analysis techniques , 1998, Pattern Recognit..

[30]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Kagan Tumer,et al.  Analysis of decision boundaries in linearly combined neural classifiers , 1996, Pattern Recognit..

[32]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[33]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ernest Valveny,et al.  Local norm features based on ridgelets transform , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[36]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[37]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[38]  Djemel Ziou,et al.  Combining positive and negative examples in relevance feedback for content-based image retrieval , 2003, J. Vis. Commun. Image Represent..

[39]  Nizar Bouguila,et al.  Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data , 2007, NIPS.