Bootstrapping Visual Categorization With Relevant Negatives

Learning classifiers for many visual concepts are important for image categorization and retrieval. As a classifier tends to misclassify negative examples which are visually similar to positive ones, inclusion of such misclassified and thus relevant negatives should be stressed during learning. User-tagged images are abundant online, but which images are the relevant negatives remains unclear. Sampling negatives at random is the de facto standard in the literature. In this paper, we go beyond random sampling by proposing Negative Bootstrap. Given a visual concept and a few positive examples, the new algorithm iteratively finds relevant negatives. Per iteration, we learn from a small proportion of many user-tagged images, yielding an ensemble of meta classifiers. For efficient classification, we introduce Model Compression such that the classification time is independent of the ensemble size. Compared with the state of the art, we obtain relative gains of 14% and 18% on two present-day benchmarks in terms of mean average precision. For concept search in one million images, model compression reduces the search time from over 20 h to approximately 6 min. The effectiveness and efficiency, without the need of manually labeling any negatives, make negative bootstrap appealing for learning better visual concept classifiers.

[1]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[2]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[3]  Xirong Li,et al.  Visual categorization with negative examples for free , 2009, ACM Multimedia.

[4]  Bo Zhang,et al.  Learning concepts from large scale imbalanced data sets using support cluster machines , 2006, MM '06.

[5]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[8]  Subhransu Maji,et al.  Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[10]  David M. J. Tax,et al.  One-class classification , 2001 .

[11]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[12]  Jianxin Wu,et al.  A Fast Dual Method for HIK SVM Learning , 2010, ECCV.

[13]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[14]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[16]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[17]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Ivor W. Tsang,et al.  Textual Query of Personal Photos Facilitated by Large-Scale Web Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Marcel Worring,et al.  Social negative bootstrapping for visual categorization , 2011, ICMR '11.

[21]  Markus Koch,et al.  Learning automatic concept detectors from online video , 2010, Comput. Vis. Image Underst..

[22]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Xuelong Li,et al.  Negative Samples Analysis in Relevance Feedback , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[25]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Xian-Sheng Hua,et al.  Active Reranking for Web Image Search , 2010, IEEE Transactions on Image Processing.

[27]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[28]  Gang Wang,et al.  Learning Image Similarity from Flickr Groups Using Fast Kernel Machines , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  D. C. Koelma,et al.  A software environment for image interpretation , 1996 .

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  Dacheng Tao,et al.  Sparse transfer learning for interactive video search reranking , 2012, TOMCCAP.

[33]  Keiji Yanai,et al.  Probabilistic web image gathering , 2005, MIR '05.

[34]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[35]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[36]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[37]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[38]  Nitesh V. Chawla,et al.  Learning Ensembles from Bites: A Scalable and Accurate Approach , 2004, J. Mach. Learn. Res..

[39]  Tat-Seng Chua,et al.  Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation , 2012, IEEE Transactions on Image Processing.

[40]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[41]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[42]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[43]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[44]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[45]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Gang Wang,et al.  On the sampling of web images for learning visual concept classifiers , 2010, CIVR '10.

[47]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[48]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.