Prajna: Towards Recognizing Whatever You Want from Images without Image Labeling

With the advances in distributed computation, machine learning and deep neural networks, we enter into an era that it is possible to build a real world image recognition system. There are three essential components to build a real-world image recognition system: 1) creating representative features, 2) designing powerful learning approaches, and 3) identifying massive training data. While extensive researches have been done on the first two aspects, much less attention has been paid on the third. In this paper, we present an end-to-end Web knowledge discovery system, Prajna. Starting from an arbitrary set of entities as inputs, Prajna automatically crawls images from multiple sources, identifies images that have reliably labeled, trains models and build a recognition system that is capable of recognizing any new images of the entity set. Due to the high cost of manual data labeling, leveraging the massive yet noisy data on the Internet is a natural idea, but the practical engineering aspect is highly challenging. Prajna focuses on separating reliable training data from extensive noisy data, which is a key to the capability of extending an image recognition system to support arbitrary entities. In this paper, we will analyze the intrinsic characteristics of Internet image data, and find ways to mine accurate and informative information from those data to build a training set, which is then used to train image recognition models. Prajna is capable of automatically building an image recognition system for those entities as long as we can collect sufficient number of images of the entities on the Web.

[1]  Marcel Worring,et al.  Learning tag relevance by neighbor voting for social image retrieval , 2008, MIR '08.

[2]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jason Weston,et al.  Label Partitioning For Sublinear Ranking , 2013, ICML.

[4]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[6]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[7]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[8]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[10]  Xian-Sheng Hua,et al.  Bayesian video search reranking , 2008, ACM Multimedia.

[11]  Dong Liu,et al.  Semi-Automatic Tagging of Photo Albums via Exemplar Selection and Tag Inference , 2011, IEEE Transactions on Multimedia.

[12]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[13]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[14]  Jing Wang,et al.  Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.

[15]  Dong Liu,et al.  Image Retagging Using Collaborative Tag Propagation , 2011, IEEE Transactions on Multimedia.

[16]  Shih-Fu Chang,et al.  Video search reranking through random walk over document-level context graph , 2007, ACM Multimedia.

[17]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[19]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[22]  Gideon S. Mann,et al.  MapReduce/Bigtable for Distributed Optimization , 2010 .

[23]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[26]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.