Multi-label multi-instance learning with missing object tags

In this paper, a novel framework is developed for leveraging large-scale loosely tagged images for object classifier training by addressing three key issues jointly: (a) spamtags e.g., some tags are more related to popular query terms rather than the image semantics; (b) looseobjecttags, e.g., multiple object tags are loosely given at the image level without identifying the object locations in the images; (c) missingobjecttags, e.g., some object tags are missed and thus negative bags may contain positive instances. To address these three issues jointly, our framework consists of the following key components for leveraging large-scale loosely tagged images for object classifier training: (1) distributed image clustering and inter-cluster visual correlation analysis for handling the issue of spam tags by filtering out large amounts of junk images automatically, (2) multiple instance learning with missing tag prediction for dealing with the issues of loose object tags and missing object tags jointly; (3) structural learning for leveraging the inter-object visual correlations to train large numbers of inter-related object classifiers jointly. Our experiments on large-scale loosely tagged images have provided very positive results.

[1]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Eric P. Xing,et al.  Harmonium Models for Semantic Video Representation and Classification , 2007, SDM.

[5]  Jianping Fan,et al.  Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, CVPR.

[7]  Shih-Fu Chang,et al.  Detecting image near-duplicate by stochastic attributed relational graph matching with learning , 2004, MULTIMEDIA '04.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[12]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[13]  Jianping Fan,et al.  Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation , 2008, IEEE Transactions on Image Processing.

[14]  Ben Taskar,et al.  Semi-Supervised Learning with Adversarially Missing Label Information , 2010, NIPS.

[15]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[16]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[17]  Shih-Fu Chang,et al.  Detection of non-identical duplicate consumer photographs , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[18]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Bin Wang,et al.  Large-Scale Duplicate Detection for Web Image Search , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[20]  Boris Babenko,et al.  Weakly Supervised Object Localization with Stable Segmentations , 2008, ECCV.

[21]  B. S. Manjunath,et al.  Color image segmentation , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[22]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[23]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[24]  Trevor Darrell,et al.  Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Thomas Hofmann,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2007 .

[26]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[27]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[30]  Bingbing Ni,et al.  Building descriptive and discriminative visual codebook for large-scale image applications , 2010, Multimedia Tools and Applications.

[31]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[32]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[33]  B. Taskar,et al.  Learning from ambiguously labeled images , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Qi Zhang,et al.  Content-Based Image Retrieval Using Multiple-Instance Learning , 2002, ICML.

[35]  Alexander G. Hauptmann,et al.  Discriminative Fields for Modeling Semantic Concepts in Video , 2007, RIAO.

[36]  Jianping Fan,et al.  JustClick: Personalized Image Recommendation via Exploratory Search From Large-Scale Flickr Images , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Meng Wang,et al.  Correlative Linear Neighborhood Propagation for Video Annotation , 2009, IEEE Trans. Syst. Man Cybern. Part B.

[38]  Boris Babenko,et al.  ImprovingWeb-based Image Search via Content Based Clustering , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[39]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[40]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[41]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[44]  Kristen Grauman,et al.  Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[46]  Jianping Fan,et al.  Mining Multilevel Image Semantics via Hierarchical Classification , 2008, IEEE Transactions on Multimedia.

[47]  Nicu Sebe,et al.  Multi-scale sub-image search , 1999, MULTIMEDIA '99.

[48]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Edward Y. Chang,et al.  Enhancing DPF for near-replica image recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[50]  Hung-Khoon Tan,et al.  Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context , 2009, IEEE Transactions on Multimedia.

[51]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[52]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[54]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[55]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[56]  Zihan Zhou,et al.  Demo: Robust face recognition via sparse representation , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[57]  Jianping Fan,et al.  Multiple instance learning with missing object tags , 2011, ICIMCS '11.

[58]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[59]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[60]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[61]  Martial Hebert,et al.  Training Object Detection Models with Weakly Labeled Data , 2002, BMVC.

[62]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.