Active Learning from Noisy Tagged Images

Learning successful image classification models requires large quantities of labelled examples that are generally hard to obtain. On the other hand, the web provides an abundance of loosely labelled images, i.e. tagged in websites such as Flickr. Although these images are cheap and massively available, their tags are typically noisy and unreliable. In an attempt to use such images for training a classifier, we propose a simple probabilistic model to learn a latent semantic space from which deep vector representations of the images and tags are generated. This latent space is subsequently used in an active learning framework based on adaptive submodular optimisation that selects informative images to be labelled. Afterwards, we update the classifier according to the importance of each labelled image to best capture the information they provide. Through this simple approach, we are able to train a classifier that performs well using a fraction of the effort that is typically required for image labelling and classifier training.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Robinson Piramuthu,et al.  ConceptLearner: Discovering visual concepts from weakly labeled image collections , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[6]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[7]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[9]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.

[10]  Russell Greiner,et al.  Optimistic Active-Learning Using Mutual Information , 2007, IJCAI.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Antonio Ortega,et al.  Active semi-supervised learning using sampling theory for graph signals , 2014, KDD.

[18]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[19]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[20]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[21]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[22]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[23]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.