A Study on Bayes Feature Fusion for Image Classification

We consider here the problem of image classification when more than one visual feature are available. In these cases, Bayes fusion offers an attractive solution by combining the results of different classifiers (one classifier per feature). This is a general form of the so-called "naive Bayes" approach. Analyzing the performance of Bayes fusion with respect to a Bayesian classifier over the joint feature distribution, however, is tricky. On the one hand, it is well-known that the latter has lower bias than the former, unless the features are conditionally independent, in which case the two coincide. On the other hand, as noted by Friedman, the low variance associated with naive Bayes estimation may dramatically mitigate the effect of its bias. In this paper, we attempt to assess the tradeoff between these two factors by means of experimental tests on two image data sets using color and texture features. Our results suggest that (1) the difference between the correct classification rates using Bayes fusion and using the joint feature distribution is a function of the conditional dependence of the features (measured in terms of mutual information), however: (2) for small training data size, Bayes fusion performs almost as well as the classifier on the joint distribution.

[1]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[2]  Josef Kittler,et al.  A weighted combination of classifiers employing shared and distinct representations , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[3]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[5]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[6]  David H. Wolpert,et al.  On Bias Plus Variance , 1997, Neural Computation.

[7]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[8]  Daniel S. Hirschberg,et al.  Small Sample Statistics for Classification Error Rates I: Error Rate Measurements , 1996 .

[9]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[10]  William S. Cooper,et al.  Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995, TOIS.

[11]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[12]  Tom Heskes,et al.  Bias/Variance Decompositions for Likelihood-Based Estimators , 1998, Neural Computation.

[13]  R. Manduchi,et al.  Classification Experiments on Real-World Texture , 2001 .

[14]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[15]  Padhraic Smyth,et al.  Clustering Using Monte Carlo Cross-Validation , 1996, KDD.

[16]  L. Ryd,et al.  On bias. , 1994, Acta orthopaedica Scandinavica.

[17]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[18]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[20]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.