论文信息 - CRMActive: An Active Learning Based Approach for Effective Video Annotation and Retrieval

CRMActive: An Active Learning Based Approach for Effective Video Annotation and Retrieval

Conventional multimedia annotation/retrieval systems such as Normalized Continuous Relevance Model (NormCRM)[7] require a fully labeled training data for a good performance. Active Learning, by determining an order for labeling the training data, allows for a good performance even before the training data is fully annotated. In this work we propose an active learning algorithm, which combines a novel measure of sample uncertainty with a novel clustering-based approach for determining sample density and diversity and integrate it with NormCRM. The clusters are also iteratively refined to ensure both feature and label-level agreement among samples. We show that our approach outperforms multiple baselines both on a new, open dataset and on the popular TRECVID corpus at both the tasks of annotation and text-based retrieval of videos.

Anton Leuski | Moitreya Chatterjee | A. Leuski | Moitreya Chatterjee

[1] R. Manmatha,et al. Statistical models for automatic video annotation and retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Louis-Philippe Morency,et al. A Multimodal Context-based Approach for Distress Assessment , 2014, ICMI.

[3] Shih-Fu Chang,et al. Multimedia knowledge: discovery, classification, browsing, and retrieval , 2005 .

[4] Klaus Brinker,et al. Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[5] Marc G. Genton,et al. Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[6] Louis-Philippe Morency,et al. Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach , 2014, ICMI.

[7] Li-Rong Dai,et al. Video Annotation by Active Learning and Cluster Tuning , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[8] Tony Jebara,et al. Probability Product Kernels , 2004, J. Mach. Learn. Res..

[9] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[10] Thomas S. Huang,et al. Leveraging Active Learning for Relevance Feedback Using an Information Theoretic Diversity Measure , 2006, CIVR.

[11] Stacy Marsella,et al. SmartBody: behavior realization for embodied conversational agents , 2008, AAMAS.

[12] Louis-Philippe Morency,et al. Verbal Behaviors and Persuasiveness in Online Multimedia Content , 2014, SocialNLP@COLING.

[13] Jan Sedmidubský,et al. Retrieving Similar Movements in Motion Capture Data , 2013, SISAP.

[14] Edward Y. Chang,et al. Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[15] Wei Liu,et al. Multimedia classification and event detection using double fusion , 2013, Multimedia Tools and Applications.

[16] Thorsten Joachims,et al. Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[17] Yi Yang,et al. Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[18] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19] Louis-Philippe Morency,et al. Acoustic and para-verbal indicators of persuasiveness in social multimedia , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Edward Y. Chang,et al. Active Learning for Interactive Multimedia Retrieval , 2008, Proceedings of the IEEE.

[21] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[22] Louis-Philippe Morency,et al. Context-based signal descriptors of heart-rate variability for anxiety assessment , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).