Online Multi-Label Active Learning for Large-Scale Multimedia Annotation

Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real content-based video search. However, due to the complexity of both video data and semantic concepts, existing techniques on automatic video annotation are still not able to handle large-scale video set and large-scale concept set, in terms of both annotation accuracy and computation cost. To address this problem, in this paper, we propose a scalable framework for annotation-based video search, as well as a novel approach to enable large-scale semantic concept annotation, that is, online multi-label active learning. This framework is scalable to both the video sample dimension and concept label dimension. Large-scale unlabeled video samples are assumed to arrive consecutively in batches with an initial pre-labeled training set, based on which a preliminary multi-label classifier is built. For each arrived batch, a multi-label active learning engine is applied, which automatically selects and manually annotates a set of unlabeled sample-label pairs. And then an online learner updates the original classifier by taking the newly labeled sample-label pairs into consideration. This process repeats until all data are arrived. During the process, new labels, even without any pre-labeled training samples, can be incorporated into the process anytime. Experiments on TRECVID dataset demonstrate the effectiveness and efficiency of the proposed framework.

[1]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[2]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[3]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[4]  Edward Y. Chang,et al.  Support Vector Machine Concept-Dependent Active Learning for Image Retrieval , 2005 .

[5]  Yiannis Kompatsiaris,et al.  COST292 experimental framework for TRECVID2008 , 2008, TRECVID.

[6]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[7]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Stéphane Ayache,et al.  TRECVID 2007: Collaborative Annotation using Active Learning , 2007, TRECVID.

[9]  Tao Mei,et al.  To construct optimal training set for video annotation , 2006, MM '06.

[10]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[11]  HuaXian-Sheng,et al.  Two-Dimensional Multilabel Active Learning with an Efficient Online Adaptation Model for Image Classification , 2009 .

[12]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Bir Bhanu,et al.  Active concept learning for image retrieval in dynamic databases , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Yihong Gong,et al.  Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[16]  Nenghai Yu,et al.  A comprehensive human computation framework: with application to image labeling , 2008, ACM Multimedia.

[17]  Hung-Khoon Tan,et al.  Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and search , 2007, TRECVID.

[18]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[19]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[22]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[23]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[24]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[25]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[26]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[27]  Shih-Fu Chang,et al.  Columbia University TRECVID 2007 High-Level Feature Extraction , 2007, TRECVID.

[28]  Meng Wang,et al.  Microsoft Research Asia TRECVID 2006 High-Level Feature Extraction and Rushes Exploitation , 2006, TRECVID.

[29]  Klaus Brinker,et al.  On Active Learning in Multi-label Classification , 2005, GfKl.

[30]  John Langford,et al.  Telling humans and computers apart automatically , 2004, CACM.

[31]  Lei Wang,et al.  Multilabel SVM active learning for image classification , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[32]  Shih-Fu Chang,et al.  Active Context-Based Concept Fusionwith Partial User Labels , 2006, 2006 International Conference on Image Processing.

[33]  Dong Wang,et al.  THU and ICRC at TRECVID 2007 , 2007, TRECVID.

[34]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[35]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[36]  Yiannis Kompatsiaris,et al.  The COST292 experimental framework for TRECVID 2007 , 2007, TRECVID.

[37]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[38]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[39]  Stefan Rüping,et al.  Incremental Learning with Support Vector Machines , 2001, ICDM.

[40]  Michael R. Lyu,et al.  A semi-supervised active learning framework for image retrieval , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.