A visual-based late-fusion framework for video genre classification

In this paper we investigate the performance of visual features in the context of video genre classification. We propose a late-fusion framework that employs color, texture, structural and salient region information. Experimental validation was carried out in the context of the MediaEval 2012 Genre Tagging Task using a large data set of more than 2,000 hours of footage and 26 video genres. Results show that the proposed approach significantly improves genre classification performance outperforming other existing approaches. Furthermore, we prove that our approach can help improving the performance of the more efficient text-based approaches.

[1]  Peter Knees,et al.  Augmenting Text-based Music Retrieval with Audio Similarity: Advantages and Limitations , 2009, ISMIR.

[2]  Rainer Stiefelhagen,et al.  KIT at MediaEval 2012 - Content - based Genre Classification with Visual Cues , 2012, MediaEval.

[3]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[4]  Markus Koch,et al.  TubeFiler: an automatic web video categorizer , 2009, ACM Multimedia.

[5]  Sebastian Schmiedeke,et al.  Overview of MediaEval 2012 Genre Tagging Task , 2012 .

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Patrick Lambert,et al.  Automatic web video categorization using audio-visual information and hierarchical clustering RF , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[8]  Thomas Sikora,et al.  TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches , 2012, MediaEval.

[9]  Gabriela Csurka,et al.  An empirical study of fusion operators for multimodal image retrieval , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[10]  Tao Mei,et al.  Automatic Video Genre Categorization using Hierarchical SVM , 2006, 2006 International Conference on Image Processing.

[11]  Yangyang Shi,et al.  TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers , 2012, MediaEval.

[12]  Horia Cucu,et al.  ARF @ MediaEval 2012: Multimodal Video Classification , 2012, MediaEval.

[13]  Urbano Nunes,et al.  Trainable classifier-fusion schemes: An application to pedestrian detection , 2009, 2009 12th International IEEE Conference on Intelligent Transportation Systems.

[14]  Thomas Sikora,et al.  The MPEG-7 visual standard for content description-an overview , 2001, IEEE Trans. Circuits Syst. Video Technol..

[15]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[16]  Christoph Rasche,et al.  An Approach to the Parameterization of Structure for Fast Categorization , 2010, International Journal of Computer Vision.

[17]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[18]  Yangyang Shi,et al.  MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks , 2012, MediaEval.

[19]  Mark Pawlewski,et al.  Video genre classification using dynamics , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  Jurandy Almeida,et al.  UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task , 2012, MediaEval.