CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines

We propose a content-based soft annotation (CBSA) procedure for providing images with semantical labels. The annotation procedure starts with labeling a small set of training images, each with one single semantical label (e.g., forest, animal, or sky). An ensemble of binary classifiers is then trained for predicting label membership for images. The trained ensemble is applied to each individual image to give the image multiple soft labels, and each label is associated with a label membership factor. To select a base binary-classifier for CBSA, we experiment with two learning methods, support vector machines (SVMs) and Bayes point machines (BPMs), and compare their class-prediction accuracy. Our empirical study on a 116-category 25K-image set shows that the BPM-based ensemble provides better annotation quality than the SVM-based ensemble for supporting multimodal image retrievals.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Beng Chin Ooi,et al.  Giving meanings to WWW images , 2000, ACM Multimedia.

[3]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[5]  Edward Y. Chang,et al.  Discovery of a perceptual distance function for measuring image similarity , 2003, Multimedia Systems.

[6]  T. Watkin Optimal Learning with a Neural Network , 1993 .

[7]  Edward Y. Chang,et al.  MEGA---the maximizing expected generalization algorithm for learning complex query concepts , 2003, TOIS.

[8]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[9]  Wei-Ying Ma,et al.  Benchmarking of image features for content-based retrieval , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[10]  Pal Rujan,et al.  Playing Billiards in Version Space , 1997, Neural Computation.

[11]  Eddy Mayoraz,et al.  Improved Pairwise Coupling Classification with Correcting Classifiers , 1998, ECML.

[12]  Mario Marchand,et al.  Computing the Bayes Kernel Classifler , 2004 .

[13]  T. Watkin On optimal neural network learning , 1993 .

[14]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[15]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[16]  Edward Y. Chang,et al.  DynDex: a dynamic and non-metric space indexer , 2002, MULTIMEDIA '02.

[17]  Jia-Guu Leu Computing a shape's moments from its boundary , 1991, Pattern Recognit..

[18]  Shih-Fu Chang,et al.  Integration of Visual and Text-Based Approaches for the Content Labeling and Classification of Photographs , 1999, SIGIR 1999.

[19]  Edward Y. Chang,et al.  Effective image annotation via active learning , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[20]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[22]  Hironobu Takahashi,et al.  Automatic word assignment to images based on image division and vector quantization , 2000 .

[23]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[24]  Kien A. Hua,et al.  SamMatch: a flexible and efficient sampling-based image retrieval technique for large image databases , 1999, MULTIMEDIA '99.

[25]  J. Moran,et al.  Sensation and perception , 1980 .

[26]  A. Murat Tekalp,et al.  Region-Based Shape Matching for Automatic Image Annotation and Query-by-Example , 1997, J. Vis. Commun. Image Represent..

[27]  Edward Y. Chang,et al.  Mining image features for efficient query processing , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[28]  John R. Smith,et al.  Learning to annotate video databases , 2001, IS&T/SPIE Electronic Imaging.

[29]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[30]  Tom Minka,et al.  Vision texture for annotation , 1995, Multimedia Systems.

[31]  Edward Y. Chang,et al.  SVM binary classifier ensembles for image classification , 2001, CIKM '01.

[32]  James Ze Wang,et al.  SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Bernhard Schölkopf,et al.  Computing the Bayes Kernel Classifier , 2000 .

[34]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[35]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[36]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[37]  B. S. Manjunath,et al.  A texture descriptor for browsing and similarity retrieval , 2000, Signal Process. Image Commun..

[38]  L. Breiman Arcing Classifiers , 1998 .

[39]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[40]  E. Y. Chang,et al.  Toward perception-based image retrieval , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[41]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[42]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[43]  Shih-Fu Chang,et al.  Semantic visual templates: linking visual features to semantics , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[44]  Mary Czerwinski,et al.  Semi-Automatic Image Annotation , 2001, INTERACT.

[45]  Shih-Fu Chang,et al.  Semantic knowledge construction from annotated image collections , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[46]  Ralf Herbrich,et al.  Bayes Point Machines: Estimating the Bayes Point in Kernel Space , 1999 .

[47]  John R. Smith,et al.  Learning semantic multimedia representations from a small set of examples , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.