Sampling and Ontologically Pooling Web Images for Visual Concept Learning

Sufficient training examples are essential for effective learning of semantic visual concepts. In practice, however, acquiring noise-free training examples has always been expensive. Recently the rapid popularization of social media websites, such as Flickr, has made it possible to collect training exemplars without human assistance. This paper proposes a novel and efficient approach to collect training samples from the noisily tagged Web images for visual concept learning, where we try to maximize two important criteria, relevancy and coverage, of the automatically generated training sets. For the former, a simple method named semantic field is introduced to handle the imprecise and incomplete image tags. Specifically, the relevancy of an image to a target concept is predicted by collectively analyzing the associated tag list of the image using two knowledge sources: WordNet corpus and statistics from Flickr.com. To boost the coverage or diversity of the training sets, we further propose an ontology-based hierarchical pooling method to collect samples not only based on the target concept alone, but also from ontologically neighboring concepts. Extensive experiments on three different datasets (NUS-WIDE, PASCAL VOC, and ImageNet) demonstrate the effectiveness of our proposed approach, producing competitive performance even when comparing with concept classifiers learned using expert- labeled training examples.

[1]  Xirong Li,et al.  Visual categorization with negative examples for free , 2009, ACM Multimedia.

[2]  Chong-Wah Ngo,et al.  Semantic context transfer across heterogeneous sources for domain adaptive video search , 2009, ACM Multimedia.

[3]  Chong-Wah Ngo,et al.  Columbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search , 2008, TRECVID.

[4]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[5]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[6]  Sasha Scambler In search of a label , 2014 .

[7]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[8]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[9]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[10]  Rong Yan,et al.  Semi-supervised cross feature learning for semantic concept detection in videos , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[12]  Kilian Q. Weinberger,et al.  Reliable tags using image similarity: mining specificity and expertise from large-scale multimedia databases , 2009, WSMC '09.

[13]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[14]  Markus Koch,et al.  Learning automatic concept detectors from online video , 2010, Comput. Vis. Image Underst..

[15]  Nicu Sebe,et al.  A new analysis of the value of unlabeled data in semi-supervised learning for image retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[16]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Shih-Fu Chang,et al.  Label diagnosis through self tuning for web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[21]  Adrian Ulges,et al.  Relevance filtering meets active learning: improving web-based concept detectors , 2010, MIR '10.

[22]  Gang Wang,et al.  OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning , 2007, CVPR.

[23]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[24]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[27]  Adrian Ulges,et al.  Identifying relevant frames in weakly labeled videos for training concept detectors , 2008, CIVR '08.

[28]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[29]  Marcel Worring,et al.  Social negative bootstrapping for visual categorization , 2011, ICMR '11.

[30]  Gang Wang,et al.  On the sampling of web images for learning visual concept classifiers , 2010, CIVR '10.

[31]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[32]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[33]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[34]  Cees Snoek,et al.  Can social tagged images aid concept-based video search? , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[35]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[36]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[37]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[38]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[39]  Jianping Fan,et al.  Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Adrian Ulges,et al.  Visual Concept Learning from Weakly Labeled Web Videos , 2010, Video Search and Mining.

[41]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).