A Selective Weighted Late Fusion for Visual Concept Recognition

We propose in this paper a novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of visual and textual features. It relies on a Selective Weighted Late Fusion (SWLF) scheme which, in optimizing an overall Mean interpolated Average Precision (MiAP), learns to automatically select and weight the best experts for each visual concept to be recognized. Experiments were conducted on the MIR Flickr image collection within the ImageCLEF 2011 Photo Annotation challenge. The results have brought to the fore the effectiveness of SWLF as it achieved a MiAP of 43.69 % for the detection of the 99 visual concepts which ranked 2nd out of the 79 submitted runs, while our new variant of SWLF allows to reach a MiAP of 43.93 %.

[1]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[2]  Jenny Benois-Pineau,et al.  Strategies for multiple feature fusion with Hierarchical HMM: Application to activity recognition from wearable audiovisual sensors , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[3]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Emmanuel Dellandréa,et al.  Associating Textual Features with Visual Ones to Improve Affective Image Classification , 2011, ACII.

[6]  Stefanie Nowak,et al.  The Fraunhofer IDMT at ImageCLEF 2011 Photo Annotation Task , 2011, CLEF.

[7]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[10]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[12]  Chao Zhu,et al.  Visual object recognition using DAISY descriptor , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[13]  Adrian Popescu,et al.  CEA LIST's Participation to Visual Concept Detection Task of ImageCLEF 2011 , 2011, CLEF.

[14]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Liming Chen,et al.  Line segment based edge feature using Hough transform , 2007 .

[17]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[18]  James Ze Wang,et al.  Content-based image retrieval: approaches and trends of the new age , 2005, MIR '05.

[19]  Gabriela Csurka,et al.  Crossing textual and visual content in different application scenarios , 2009, Multimedia Tools and Applications.

[20]  Emmanuel Dellandréa,et al.  Evaluation of Features and Combination Approaches for the Classification of Emotional Semantics in Images , 2011, VISAPP.

[21]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[22]  J. Itten The art of color : the subjective experience and objective rationale of color , 1973 .

[23]  Alberto Del Bimbo,et al.  Semantics in Visual Information Retrieval , 1999, IEEE Multim..

[24]  Aleksandra Mojsilovic,et al.  Semantic-Friendly Indexing and Quering of Images Based on the Extraction of the Objective Semantic Cues , 2004, International Journal of Computer Vision.

[25]  Emmanuel Dellandréa,et al.  Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme , 2013, Comput. Vis. Image Underst..

[26]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Stefanie Nowak,et al.  The CLEF 2011 Photo Annotation and Concept-based Retrieval Tasks , 2011, CLEF.

[28]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[29]  Anil K. Jain,et al.  Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Djoerd Hiemstra,et al.  A Probabilistic Multimedia Retrieval Model and Its Evaluation , 2003, EURASIP J. Adv. Signal Process..

[31]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[32]  Emmanuel Dellandréa,et al.  LIRIS-Imagine at ImageCLEF 2011 Photo Annotation Task , 2011, CLEF.

[33]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[34]  Rosalind W. Picard Affective Computing , 1997 .

[35]  Matthieu Cord,et al.  Rushes summarization by IRIM consortium: redundancy removal and multi-feature fusion , 2008, TVS '08.

[36]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[37]  Emmanuel Dellandréa,et al.  Classification of affective semantics in images based on discrete and dimensional models of emotions , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[38]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[39]  Hugo Jair Escalante,et al.  Multimodal indexing based on semantic cohesion for image retrieval , 2011, Information Retrieval.

[40]  Derek Hoiem,et al.  Building text features for object image classification , 2009, CVPR.

[41]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[42]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[43]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[44]  Stefanie Nowak,et al.  Content-based mood classification for photos and music: a generic multi-modal classification framework and evaluation approach , 2008, MIR '08.

[45]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[46]  Liming Chen,et al.  Multi-scale Color Local Binary Patterns for Visual Object Classes Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[47]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[48]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[49]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[50]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[51]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, ICCV.

[53]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[54]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[55]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[56]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  R.M. Haralick,et al.  Statistical and structural approaches to texture , 1979, Proceedings of the IEEE.

[58]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[59]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[60]  P. Valdez,et al.  Effects of color on emotions. , 1994, Journal of experimental psychology. General.

[61]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[62]  Di Huang,et al.  Comparison of 2D/3D Features and Their Adaptive Score Level Fusion for 3D Face Recognition , 2010 .

[63]  Motoaki Kawanabe,et al.  The Joint Submission of the TU Berlin and Fraunhofer FIRST (TUBFI) to the ImageCLEF2011 Photo Annotation Task , 2011, CLEF.