Active learning in multimedia annotation and retrieval: A survey

Active learning is a machine learning technique that selects the most informative samples for labeling and uses them as training data. It has been widely explored in multimedia research community for its capability of reducing human annotation effort. In this article, we provide a survey on the efforts of leveraging active learning in multimedia annotation and retrieval. We mainly focus on two application domains: image/video annotation and content-based image retrieval. We first briefly introduce the principle of active learning and then we analyze the sample selection criteria. We categorize the existing sample selection strategies used in multimedia annotation and retrieval into five criteria: risk reduction, uncertainty, diversity, density and relevance. We then introduce several classification models used in active learning-based multimedia annotation and retrieval, including semi-supervised learning, multilabel learning and multiple instance learning. We also provide a discussion on several future trends in this research direction. In particular, we discuss cost analysis of human annotation and large-scale interactive multimedia annotation.

[1]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[2]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[3]  Yongdong Zhang,et al.  Locally non-negative linear structure learning for interactive image retrieval , 2009, MM '09.

[4]  Kristen Grauman,et al.  Multi-Level Active Prediction of Useful Image Annotations for Recognition , 2008, NIPS.

[5]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[7]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[8]  Matthieu Cord,et al.  A comparison of active classification methods for content-based image retrieval , 2004, CVDB '04.

[9]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[10]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[11]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[12]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[13]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[14]  Fei-Fei Li,et al.  Towards Scalable Dataset Construction: An Active Learning Approach , 2008, ECCV.

[15]  Howard D. Wactlar,et al.  Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers , 2005, MULTIMEDIA '05.

[16]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[17]  Mark Craven,et al.  Active Learning with Real Annotation Costs , 2008 .

[18]  Edward Y. Chang,et al.  Concept boundary detection for speeding up SVMs , 2006, ICML '06.

[19]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[20]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[21]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[23]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[24]  Liu Yang,et al.  Negative Results for Active Learning with Convex Losses , 2010, AISTATS.

[25]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, CVPR.

[26]  Xian-Sheng Hua,et al.  Two-Dimensional Multilabel Active Learning with an Efficient Online Adaptation Model for Image Classification , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jingrui He,et al.  Mean version space: a new active learning method for content-based image retrieval , 2004, MIR '04.

[28]  Edward Y. Chang,et al.  Effective image annotation via active learning , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[29]  Ashish Kapoor,et al.  Active learning for large multi-class problems , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Xian-Sheng Hua,et al.  Unbiased active learning for image retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[31]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[32]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[33]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[34]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[35]  Christopher H. Bryant,et al.  Functional genomic hypothesis generation and experimentation by a robot scientist , 2004, Nature.

[36]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[37]  Michael R. Lyu,et al.  A semi-supervised active learning framework for image retrieval , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[38]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[39]  John R. Smith,et al.  Active learning for simultaneous annotation of multiple binary semantic concepts [video content analysis] , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[40]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[41]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[42]  Thomas S. Huang,et al.  Leveraging Active Learning for Relevance Feedback Using an Information Theoretic Diversity Measure , 2006, CIVR.

[43]  Qi Zhang,et al.  Positive Sample Enhanced Angle-Diversity Active Learning for SVM Based Image Retrieval , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[44]  Edward Y. Chang,et al.  Active learning in very large databases , 2006, Multimedia Tools and Applications.

[45]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[46]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[47]  Xian-Sheng Hua,et al.  Online multi-label active annotation: towards large-scale content-based video search , 2008, ACM Multimedia.

[48]  Tsuhan Chen,et al.  Annotating retrieval database with active learning , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[49]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[51]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[52]  Andrew McCallum,et al.  Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.

[53]  Rong Yan,et al.  Hybrid Tagging and Browsing Approaches for Efficient Manual Image Annotation , 2009, IEEE MultiMedia.

[54]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[55]  Hichem Sahbi,et al.  Manifold learning using robust Graph Laplacian for interactive image search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Edward Y. Chang,et al.  Active Learning for Interactive Multimedia Retrieval , 2008, Proceedings of the IEEE.

[57]  Joshua R. Smith,et al.  A Web-based System for Collaborative Annotation of Large Image and Video Collections , 2005 .

[58]  Meng Wang,et al.  Interactive Video Annotation by Multi-Concept Multi-Modality Active Learning , 2007, Int. J. Semantic Comput..

[59]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[60]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Dilek Z. Hakkani-Tür,et al.  Active learning for automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[62]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[63]  Changsheng Xu,et al.  Multi-view multi-label active learning for image classification , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[64]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[65]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[66]  Daniel P. W. Ellis,et al.  Support vector machine active learning for music retrieval , 2006, Multimedia Systems.

[67]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[68]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[69]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[70]  Meng Wang,et al.  Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[71]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[72]  Li-Rong Dai,et al.  Video Annotation by Active Learning and Cluster Tuning , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[73]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[74]  James Ze Wang,et al.  Real-time computerized annotation of pictures. , 2008, IEEE transactions on pattern analysis and machine intelligence.

[75]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Xian-Sheng Hua,et al.  Beyond Accuracy: Typicality Ranking for Video Annotation , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[77]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[78]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[79]  Marcel Worring,et al.  TREC Feature Extraction by Active Learning , 2002, TREC.

[80]  Navneet Panda,et al.  Active Learning in Very Large Image Databases , 2004 .

[81]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[82]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[83]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, CVPR.

[84]  John R. Smith,et al.  VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .

[85]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[86]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[87]  Zhi-Hua Zhou,et al.  On multi-view active learning and the combination with semi-supervised learning , 2008, ICML '08.

[88]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[89]  Yi Wu,et al.  Sampling Strategies for Active Learning in Personal Photo Retrieval , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[90]  Wen Gao,et al.  Multiple kernel active learning for image classification , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[91]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[92]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[93]  D. Angluin Queries and Concept Learning , 1988 .

[94]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.