Multi-Concept Multi-Modality Active Learning for Interactive Video Annotation

Active learning methods have been widely applied to reduce human labeling effort in multimedia annotation tasks. However, in traditional methods multiple concepts are usually sequentially annotated, i.e., each concept is exhaustively annotated before proceeding to the next, without taking the learnabilities of different concepts into consideration. Furthermore, in most of these methods only a single modality is applied. This paper presents a novel multi- concept multi-modality active learning method which ex- changeably annotates multiple concepts in the context of multi-modality. It iteratively selects a concept and a batch of unlabeled samples, and then these samples are annotated with the selected concept. Afier that, a graph-based semi-supervised learning is conducted on each modality for the selected concept. The proposed method takes into account both the learnabilities of different concepts and the potentials of different modalities. Experimental results on TRECVID 2005 benchmark have demonstrated its effectiveness and efficiency.

[1]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[2]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[3]  Joshua R. Smith,et al.  A Web-based System for Collaborative Annotation of Large Image and Video Collections , 2005 .

[4]  Alexander G. Hauptmann Lessons for the Future from a Decade of Informedia Video Analysis Research , 2005, CIVR.

[5]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[6]  Alexander G. Hauptmann,et al.  Active Learning in Multiple Modalities for Semantic Feature Extraction from Video , 2005 .

[7]  Howard D. Wactlar,et al.  Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers , 2005, MULTIMEDIA '05.

[8]  Yi Wu,et al.  Sampling Strategies for Active Learning in Personal Photo Retrieval , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[9]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[10]  Meng Wang,et al.  An Interactive Video Annotation Frameowrk with Multiple Modalities , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[12]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[13]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[14]  Tsuhan Chen,et al.  Annotating retrieval database with active learning , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[15]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[16]  Xiangming Mu,et al.  Content-based video retrieval: does video's semantic visual feature matter? , 2006, SIGIR.

[17]  Meng Wang,et al.  Manifold-ranking based video concept detection on large database and feature pool , 2006, MM '06.

[18]  John R. Smith,et al.  Active learning for simultaneous annotation of multiple binary semantic concepts [video content analysis] , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[19]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.