What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations

Active learning strategies can be useful when manual labeling effort is scarce, as they select the most informative examples to be annotated first. However, for visual category learning, the active selection problem is particularly complex: a single image will typically contain multiple object labels, and an annotator could provide multiple types of annotation (e.g., class labels, bounding boxes, segmentations), any of which would incur a variable amount of manual effort. We present an active learning framework that predicts the tradeoff between the effort and information gain associated with a candidate image annotation, thereby ranking unlabeled and partially labeled images according to their expected “net worth” to an object recognition system. We develop a multi-label multiple-instance approach that accommodates multi-object images and a mixture of strong and weak labels. Since the annotation cost can vary depending on an image's complexity, we show how to improve the active selection by directly predicting the time required to segment an unlabeled image. Given a small initial pool of labeled data, the proposed method actively improves the category models with minimal manual intervention.

[1]  Kristen Grauman,et al.  Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  James T. Kwok,et al.  Marginalized Multi-Instance Kernels , 2007, IJCAI.

[4]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[6]  Fei-Fei Li,et al.  Towards Scalable Dataset Construction: An Active Learning Approach , 2008, ECCV.

[7]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[8]  Yong Jae Lee,et al.  Foreground Focus: Finding Meaningful Features in Unlabeled Images , 2008, BMVC.

[9]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[10]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[12]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[13]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[14]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.

[16]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[18]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[19]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Michael I. Jordan,et al.  Fast Kernel Learning using Sequential Minimal Optimization , 2004 .

[21]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[22]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[24]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[25]  Kristen Grauman,et al.  Multi-Level Active Prediction of Useful Image Annotations for Recognition , 2008, NIPS.

[26]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[28]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.