Accurate online video tagging via probabilistic hybrid modeling

Accurate video tagging has been becoming increasingly crucial for online video management and search. This article documents a novel framework called comprehensive video tagger (CVTagger) to facilitate accurate tag-based video annotation. The system applies both multimodal and temporal properties combined with a novel classification framework with hierarchical structure based on multilayer concept model and regression analysis. The advanced architecture enables effective incorporation of both video concept dependency and temporal dynamics. Using a large-scale test collection containing 50,000 YouTube videos, a set of empirical studies have been carried out and experimental results demonstrate various advantages of CVTagger over the state-of-the-art techniques.

[1]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[2]  Shuicheng Yan,et al.  Effective music tagging through advanced statistical modeling , 2010, SIGIR.

[3]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[4]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[5]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[6]  Jianping Fan,et al.  A hierarchical access control model for video database systems , 2003, TOIS.

[7]  Rong Yan,et al.  Video Retrieval Based on Semantic Concepts , 2008, Proceedings of the IEEE.

[8]  Ahmed K. Elmagarmid,et al.  InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval , 2005, IEEE Transactions on Multimedia.

[9]  Jun Yang,et al.  Exploring temporal consistency for video analysis and retrieval , 2006, MIR '06.

[10]  Shih-Fu Chang,et al.  Short-term audio-visual atoms for generic video concept classification , 2009, ACM Multimedia.

[11]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[12]  Luciano Sbaiz,et al.  Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  David G. Stork,et al.  Pattern Classification , 1973 .

[17]  Meng Wang,et al.  Multimedia tagging: past, present and future , 2011, ACM Multimedia.

[18]  John R. Kender,et al.  Video News Shot Labeling Refinement via Shot Rhythm Models , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[19]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[20]  Jialie Shen,et al.  Personalized video similarity measure , 2011, Multimedia Systems.

[21]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[22]  Yue Gao,et al.  Brand Data Gathering From Live Social Media Streams , 2014, ICMR.

[23]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[24]  Jiebo Luo,et al.  Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.

[25]  Ming-Syan Chen,et al.  Association and Temporal Rule Mining for Post-Filtering of Semantic Concept Detection in Video , 2008, IEEE Transactions on Multimedia.

[27]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[29]  Lie Lu,et al.  Automatic mood detection and tracking of music audio signals , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Mark Sanderson,et al.  Automatic video tagging using content redundancy , 2009, SIGIR.

[31]  Dong Wang,et al.  Video diver: generic video indexing with diverse features , 2007, MIR '07.

[32]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[33]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[34]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[35]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[36]  Ivor W. Tsang,et al.  Tag-based web photo retrieval improved by batch mode re-tagging , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[38]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[39]  Meng Wang,et al.  Semi-automatic video annotation based on active learning with multiple complementary predictors , 2005, MIR '05.

[40]  Keith B. Hall,et al.  Improved video categorization from text metadata and user comments , 2011, SIGIR '11.