Extracting Visual Knowledge from the Internet: Making Sense of Image Data

Recent successes in visual recognition can be primarily attributed to feature representation, learning algorithms, and the ever-increasing size of labeled training data. Extensive research has been devoted to the first two, but much less attention has been paid to the third. Due to the high cost of manual data labeling, the size of recent efforts such as ImageNet is still relatively small in respect to daily applications. In this work, we mainly focus on how to automatically generate identifying image data for a given visual concept on a vast scale. With the generated image data, we can train a robust recognition model for the given concept. We evaluate the proposed webly supervised approach on the benchmark Pascal VOC 2007 dataset and the results demonstrates the superiority of our method over many other state-of-the-art methods in image data collection.

[1]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[2]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jian Zhang,et al.  Exploiting Web Images for Dataset Construction: A Domain Robust Approach , 2016, IEEE Transactions on Multimedia.

[4]  Zhenmin Tang,et al.  Deep representation learning for road detection using Siamese network , 2018, Multimedia Tools and Applications.

[5]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[6]  Pietro Perona,et al.  Learning Object Categories From Internet Image Searches , 2010, Proceedings of the IEEE.

[7]  Wang Wei-xing,et al.  Cell Cluster Image Segmentation on Form Analysis , 2007, Third International Conference on Natural Computation (ICNC 2007).

[8]  Kristen Grauman,et al.  Large-scale live active learning: Training object detectors with crawled data and crowds , 2011, CVPR.

[9]  Jian Zhang,et al.  Extracting Privileged Information from Untagged Corpora for Classifier Learning , 2018, IJCAI.

[10]  Alexei A. Efros,et al.  Recognition by association via learning per-exemplar distances , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Pietro Perona,et al.  Vision of a Visipedia , 2010, Proceedings of the IEEE.

[13]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[14]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[15]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[17]  Jian Zhang,et al.  Discovering and Distinguishing Multiple Visual Senses for Polysemous Words , 2018, AAAI.

[18]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jingsong Xu,et al.  Deep Learning for Person Reidentification Using Support Vector Machines , 2017, Adv. Multim..

[20]  Abhinav Gupta,et al.  Beyond active noun tagging: Modeling contextual interactions for multi-class active learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[22]  Tao Xiang,et al.  Weakly supervised object detector learning with model drift detection , 2011, 2011 International Conference on Computer Vision.

[23]  Wei Liu,et al.  Learning Binary Codes for Maximum Inner Product Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Jian Zhang,et al.  Automatic image dataset construction with multiple textual metadata , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jason Weston,et al.  Joint Image and Word Sense Discrimination for Image Retrieval , 2012, ECCV.

[28]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[29]  Yair Weiss,et al.  Learning about Canonical Views from Internet Image Collections , 2012, NIPS.

[30]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[33]  Fumin Shen,et al.  Inductive Hashing on Manifolds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Jian Zhang,et al.  A Domain Robust Approach For Image Dataset Construction , 2016, ACM Multimedia.

[35]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[36]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[37]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Tao Li,et al.  Collaborative representation based local discriminant projection for feature extraction , 2018, Digit. Signal Process..

[39]  Mohan S. Kankanhalli,et al.  Cluster-based color matching for image retrieval , 1996, Pattern Recognit..

[40]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.