Web video categorization using category-predictive classifiers and category-specific concept classifiers

In this era, automatic Web video categorization has become an important multimedia task for organizing and retrieving the plentiful videos on the Web. Due to unbounded variation in both content and quality of Web videos and deficiency in precisely labeled training data, Web video categorization remains a challenging task. In this paper, a novel three-stage framework is proposed for Web video classification using category-predictive classifiers and category-specific concept classifiers, which integrates contextual features and concept-level semantics induced from visual content. First, a content-based category-predictive (CNC) classifier is trained for each category by exploiting visual features to classify Web videos. Second, the significance of concepts for categories is measured with category-specific concept (CSC) classifiers, and it is adopted to refine CNC classifiers at keyframe-level. Third, the context-based category-predictive (CXC) classifiers induced from titles and tags are further combined with the refined CNC classifiers to reinforce the performance. Experiments on two large scale Web video datasets, MCG-WEBV and CCV, demonstrate that the proposed approach achieves promising performance. Propose novel fusion of category and concept classifiers for web video classification.Exploit Flickr enriched context space for category relevant concept selection.Fuse content and context based category-predictive classifiers to enhance performance.

[1]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[2]  Rahul Malik,et al.  VideoMule: a consensus learning approach to multi-label classification from noisy user-generated videos , 2009, MM '09.

[3]  Diane J. Cook,et al.  Automatic Video Classification: A Survey of the Literature , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Cees Snoek,et al.  Recommendations for recognizing video events by concept vocabularies , 2014, Comput. Vis. Image Underst..

[5]  Yong Feng,et al.  Multi-label learning with label relevance in advertising video , 2016, Neurocomputing.

[6]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[8]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[9]  Zhou Su,et al.  What Videos Are Similar with You?: Learning a Common Attributed Representation for Video Recommendation , 2014, ACM Multimedia.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Wei Dai,et al.  Joint categorization of queries and clips for web-based video search , 2006, MIR '06.

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Xi Wang,et al.  Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.

[14]  Shuang Wu,et al.  Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Tao Mei,et al.  Automatic Video Genre Categorization using Hierarchical SVM , 2006, 2006 International Conference on Image Processing.

[17]  Grant Schindler,et al.  Internet video category recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Jun Wang,et al.  Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification , 2014, ACM Multimedia.

[19]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Yu-Gang Jiang,et al.  SUPER: towards real-time event recognition in internet videos , 2012, ICMR.

[21]  Baoxin Li,et al.  YouTubeCat: Learning to categorize wild web videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Chong-Wah Ngo,et al.  Exploring inter-concept relationship with context space for semantic video indexing , 2009, CIVR '09.

[23]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[24]  Xian-Sheng Hua,et al.  Multi-modality web video categorization , 2007, MIR '07.

[25]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Yongdong Zhang,et al.  Web video categorization based on Wikipedia categories and content-duplicated open resources , 2010, ACM Multimedia.

[27]  Xiang Ji,et al.  Clustering and retrieval of video shots based on natural stimulus fMRI , 2014, Neurocomputing.

[28]  Yang Song,et al.  Taxonomic classification for web-based videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Chong-Wah Ngo,et al.  Boosting web video categorization with contextual information from social web , 2012, World Wide Web.

[30]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[31]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[32]  A. G. Amitha Perera,et al.  Multimedia event detection with multimodal feature fusion and temporal concept localization , 2013, Machine Vision and Applications.

[33]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[34]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[35]  Ramakant Nevatia,et al.  Evaluating multimedia features and fusion for example-based event detection , 2013, Machine Vision and Applications.

[36]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[37]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[38]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[39]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[40]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  Masoud Mazloom,et al.  Conceptlets: Selective Semantics for Classifying Video Events , 2014, IEEE Transactions on Multimedia.

[42]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[43]  Hui Cheng,et al.  Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[45]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[46]  Yongdong Zhang,et al.  Google challenge: incremental-learning for web video categorization on robust semantic feature space , 2009, ACM Multimedia.

[47]  Chong-Wah Ngo,et al.  Semantic context transfer across heterogeneous sources for domain adaptive video search , 2009, ACM Multimedia.

[48]  Zhiyu Wang,et al.  Bilateral Correspondence Model for Words-and-Pictures Association in Multimedia-Rich Microblogs , 2014, TOMM.

[49]  Peng Cui,et al.  Social-sensed multimedia computing , 2018, Frontiers of Multimedia Research.

[50]  Jeff Z. Pan,et al.  Multimedia annotations on the semantic Web , 2006, IEEE Multimedia.