Optimization of Robust Loss Functions for Weakly-Labeled Image Taxonomies

The recently proposed ImageNet dataset consists of several million images, each annotated with a single object category. These annotations may be imperfect, in the sense that many images contain multiple objects belonging to the label vocabulary. In other words, we have a multi-label problem but the annotations include only a single label (which is not necessarily the most prominent). Such a setting motivates the use of a robust evaluation measure, which allows for a limited number of labels to be predicted and, so long as one of the predicted labels is correct, the overall prediction should be considered correct. This is indeed the type of evaluation measure used to assess algorithm performance in a recent competition on ImageNet data. Optimizing such types of performance measures presents several hurdles even with existing structured output learning methods. Indeed, many of the current state-of-the-art methods optimize the prediction of only a single output label, ignoring this ‘structure’ altogether. In this paper, we show how to directly optimize continuous surrogates of such performance measures using structured output learning techniques with latent variables. We use the output of existing binary classifiers as input features in a new learning stage which optimizes the structured loss corresponding to the robust performance measure. We present empirical evidence that this allows us to ‘boost’ the performance of binary classification on a variety of weakly-supervised labeling problems defined on image taxonomies.

[1]  Stefanie Nowak,et al.  New Strategies for Image Annotation: Overview of the Photo Annotation Task at ImageCLEF 2010 , 2010, CLEF.

[2]  Cordelia Schmid,et al.  Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.

[3]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[5]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[6]  Motoaki Kawanabe,et al.  On Taxonomies for Multi-class Image Categorization , 2012, International Journal of Computer Vision.

[7]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[8]  E. Ziegel,et al.  Artificial intelligence and statistics , 1986 .

[9]  Saso Dzeroski,et al.  Detection of Visual Concepts and Annotation of Images Using Ensembles of Trees for Hierarchical Multi-Label Classification , 2010, ICPR Contests.

[10]  Hans Burkhardt,et al.  Learning Taxonomies in Large Image Databases , 2007 .

[11]  Matthew B. Blaschko,et al.  Simultaneous Object Detection and Ranking with Weak Supervision , 2010, NIPS.

[12]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[14]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[16]  Stefanie Nowak,et al.  The CLEF 2011 Photo Annotation and Concept-based Retrieval Tasks , 2011, CLEF.

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[18]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[19]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[20]  Thorsten Joachims,et al.  Training structural svms with kernels using sampled cuts , 2008, KDD.

[21]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[22]  Thomas Deselaers,et al.  Visual and semantic similarity in ImageNet , 2011, CVPR 2011.

[23]  John Langford,et al.  Hash Kernels , 2009, AISTATS.

[24]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[26]  Victor Lavrenko,et al.  Optimal Tag Sets for Automatic Image Annotation , 2011, BMVC.

[27]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[28]  Selim Aksoy,et al.  Recognizing Patterns in Signals, Speech, Images and Videos , 2010, Lecture Notes in Computer Science.

[29]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[30]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[31]  Silvio Savarese,et al.  Hierarchical classification of images by sparse approximation , 2013, Image Vis. Comput..

[32]  Alexander J. Smola,et al.  A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[33]  Chris H. Q. Ding,et al.  Image annotation using bi-relational graph of images and semantic labels , 2011, CVPR 2011.

[34]  Motoaki Kawanabe,et al.  Multi-modal visual concept classification of images via Markov random walk over tags , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[35]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[36]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[38]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[39]  Gabriela Csurka,et al.  Learning structured prediction models for interactive image labeling , 2011, CVPR 2011.

[40]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Tibério S. Caetano,et al.  Optimization of Robust Loss Functions for Weakly-Labeled Image Taxonomies , 2011, International Journal of Computer Vision.

[42]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[43]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[44]  Fei-Fei Li,et al.  Attribute Learning in Large-Scale Datasets , 2010, ECCV Workshops.

[45]  Rong Jin,et al.  Multi-label learning with incomplete class assignments , 2011, CVPR 2011.

[46]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[47]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[49]  Thomas B. Moeslund,et al.  Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.