论文信息 - Online multi-label active annotation: towards large-scale content-based video search

Online multi-label active annotation: towards large-scale content-based video search

Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real content-based video search. However, due to the complexity of both video data and semantic concepts, existing techniques on automatic video annotation are still not able to handle large-scale video set and large-scale concept set, in terms of both annotation accuracy and computation cost. To address this problem, in this paper, we propose a scalable framework for annotation-based video search, as well as a novel approach to enable large-scale semantic concept annotation, that is, online multi-label active learning. This framework is scalable to both the video sample dimension and concept label dimension. Large-scale unlabeled video samples are assumed to arrive consecutively in batches with an initial pre-labeled training set, based on which a preliminary multi-label classifier is built. For each arrived batch, a multi-label active learning engine is applied, which automatically selects and manually annotates a set of unlabeled sample-label pairs. And then an online learner updates the original classifier by taking the newly labeled sample-label pairs into consideration. This process repeats until all data are arrived. During the process, new labels, even without any pre-labeled training samples, can be incorporated into the process anytime. Experiments on TRECVID dataset demonstrate the effectiveness and efficiency of the proposed framework.

Xian-Sheng Hua | Guo-Jun Qi

[1] Xian-Sheng Hua,et al. Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Bir Bhanu,et al. Active concept learning for image retrieval in dynamic databases , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3] Stefan Rüping,et al. Incremental Learning with Support Vector Machines , 2001, ICDM.

[4] Michael R. Lyu,et al. A semi-supervised active learning framework for image retrieval , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[6] Tao Mei,et al. To construct optimal training set for video annotation , 2006, MM '06.

[7] John R. Smith,et al. IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[8] HuaXian-Sheng,et al. Two-Dimensional Multilabel Active Learning with an Efficient Online Adaptation Model for Image Classification , 2009 .

[9] Yihong Gong,et al. Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[10] Nenghai Yu,et al. A comprehensive human computation framework: with application to image labeling , 2008, ACM Multimedia.

[11] Hung-Khoon Tan,et al. Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and search , 2007, TRECVID.

[12] Xian-Sheng Hua,et al. Two-Dimensional Multilabel Active Learning with an Efficient Online Adaptation Model for Image Classification , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[14] Trevor Darrell,et al. Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15] Dong Wang,et al. THU and ICRC at TRECVID 2007 , 2007, TRECVID.

[16] Klaus Brinker,et al. On Active Learning in Multi-label Classification , 2005, GfKl.

[17] Tao Mei,et al. Correlative multi-label video annotation , 2007, ACM Multimedia.

[18] Stéphane Ayache,et al. Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[19] Edward Y. Chang,et al. Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[20] Edward Y. Chang,et al. Support Vector Machine Concept-Dependent Active Learning for Image Retrieval , 2005 .

[21] Yiannis Kompatsiaris,et al. COST292 experimental framework for TRECVID2008 , 2008, TRECVID.

[22] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[23] Stanley F. Chen,et al. A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[24] Tao Mei,et al. Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Gert Cauwenberghs,et al. Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[26] Huan Liu,et al. Handling concept drifts in incremental learning with support vector machines , 1999, KDD '99.

[27] Stéphane Ayache,et al. TRECVID 2007: Collaborative Annotation using Active Learning , 2007, TRECVID.

[28] Rong Yan,et al. Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[29] John Langford,et al. Telling humans and computers apart automatically , 2004, CACM.

[30] Lei Wang,et al. Multilabel SVM active learning for image classification , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[31] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[32] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[33] Nenghai Yu,et al. Flickr distance , 2008, ACM Multimedia.

[34] Shih-Fu Chang,et al. Columbia University TRECVID 2007 High-Level Feature Extraction , 2007, TRECVID.

[35] Meng Wang,et al. Microsoft Research Asia TRECVID 2006 High-Level Feature Extraction and Rushes Exploitation , 2006, TRECVID.

[36] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[37] Dennis Koelma,et al. The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[38] Shih-Fu Chang,et al. Active Context-Based Concept Fusionwith Partial User Labels , 2006, 2006 International Conference on Image Processing.

[39] Paul M. B. Vitányi,et al. The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[40] Rong Yan,et al. Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41] Thomas M. Cover,et al. Elements of information theory (2. ed.) , 2006 .

[42] Thomas M. Cover,et al. Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[43] Yiannis Kompatsiaris,et al. The COST292 experimental framework for TRECVID 2007 , 2007, TRECVID.

[44] Jiebo Luo,et al. Learning multi-label scene classification , 2004, Pattern Recognit..