论文信息 - Accurate online video tagging via probabilistic hybrid modeling

Accurate online video tagging via probabilistic hybrid modeling

Accurate video tagging has been becoming increasingly crucial for online video management and search. This article documents a novel framework called comprehensive video tagger (CVTagger) to facilitate accurate tag-based video annotation. The system applies both multimodal and temporal properties combined with a novel classification framework with hierarchical structure based on multilayer concept model and regression analysis. The advanced architecture enables effective incorporation of both video concept dependency and temporal dynamics. Using a large-scale test collection containing 50,000 YouTube videos, a set of empirical studies have been carried out and experimental results demonstrate various advantages of CVTagger over the state-of-the-art techniques.

Jialie ShenMeng

[1] John R. Smith,et al. On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[2] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[3] Dong Wang,et al. Video diver: generic video indexing with diverse features , 2007, MIR '07.

[4] Jianping Fan,et al. A hierarchical access control model for video database systems , 2003, TOIS.

[5] Meng Wang,et al. Multimedia tagging: past, present and future , 2011, ACM Multimedia.

[6] John R. Kender,et al. Video News Shot Labeling Refinement via Shot Rhythm Models , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[7] Xuelong Li,et al. Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[8] Jialie Shen,et al. Personalized video similarity measure , 2011, Multimedia Systems.

[9] Ahmed K. Elmagarmid,et al. InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval , 2005, IEEE Transactions on Multimedia.

[10] Ba Tu Truong,et al. Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[11] Jun Yang,et al. Exploring temporal consistency for video analysis and retrieval , 2006, MIR '06.

[12] David G. Stork,et al. Pattern Classification , 1973 .

[13] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[14] Shuicheng Yan,et al. Effective music tagging through advanced statistical modeling , 2010, SIGIR.

[15] Marcel Worring,et al. Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[16] Anil K. Jain,et al. Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[17] Jianping Fan,et al. ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[18] Shih-Fu Chang,et al. Short-term audio-visual atoms for generic video concept classification , 2009, ACM Multimedia.

[19] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[20] Ming-Syan Chen,et al. Association and Temporal Rule Mining for Post-Filtering of Semantic Concept Detection in Video , 2008, IEEE Transactions on Multimedia.

[21] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[22] Chong-Wah Ngo,et al. Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[23] Ivor W. Tsang,et al. Tag-based web photo retrieval improved by batch mode re-tagging , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24] Jiebo Luo,et al. Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.

[25] Lie Lu,et al. Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[26] Rong Yan,et al. Video Retrieval Based on Semantic Concepts , 2008, Proceedings of the IEEE.

[27] Luciano Sbaiz,et al. Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[29] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30] Chong-Wah Ngo,et al. On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[31] Hector Garcia-Molina,et al. Social tag prediction , 2008, SIGIR '08.

[32] Meng Wang,et al. Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[33] Keith B. Hall,et al. Improved video categorization from text metadata and user comments , 2011, SIGIR '11.

[34] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[35] Mark Sanderson,et al. Automatic video tagging using content redundancy , 2009, SIGIR.

[36] Yue Gao,et al. Brand Data Gathering From Live Social Media Streams , 2014, ICMR.