Fusing audio vocabulary with visual features for pornographic video detection

Pornographic video detection based on multimodal fusion is an effective approach for filtering pornography. However, existing methods lack accurate representation of audio semantics and pay little attention to the characteristics of pornographic audios. In this paper, we propose a novel framework of fusing audio vocabulary with visual features for pornographic video detection. The novelty of our approach lies in three aspects: an audio semantics representation method based on an energy envelope unit (EEU) and bag-of-words (BoW), a periodicity-based audio segmentation algorithm, and a periodicity-based video decision algorithm. The first one, named the EEU+BoW representation method, is proposed to describe the audio semantics via an audio vocabulary. The audio vocabulary is constructed by k-means clustering of EEUs. The latter two aspects echo with each other to make full use of the periodicities in pornographic audios. Using the periodicity-based audio segmentation algorithm, audio streams are divided into EEU sequences. After these EEUs are classified, videos are judged to be pornographic or not by the periodicity-based video decision algorithm. Before fusion, two support vector machines are respectively applied for the audio-vocabulary-based and visual-features-based methods. To fuse their results, a keyframe is selected from each EEU in terms of the beginning and ending positions, and then an integrated weighted scheme and a periodicity-based video decision algorithm are adopted to yield final detection results. Experimental results show that our approach outperforms the traditional one which is only based on visual features, and achieves satisfactory performance. The true positive rate achieves 94.44% while the false positive rate is 9.76%.

[1]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..

[2]  Oh-Jin Kwon,et al.  Automatic System for Filtering Obscene Video , 2008, 2008 10th International Conference on Advanced Communication Technology.

[3]  Sheng Tang,et al.  Pornprobe: an LDA-SVM based pornography detection system , 2009, ACM Multimedia.

[4]  Gerard Lacey,et al.  Multimodal Periodicity Analysis for Illicit Content Detection in Videos , 2006 .

[5]  Wei Liang,et al.  A novel approach to musical genre classification using probabilistic latent semantic analysis model , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[6]  Mohan S. Kankanhalli,et al.  Creating audio keywords for event detection in soccer video , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[7]  Yang Liu,et al.  IMAGE GUARDER: AN INTELLIGENT DETECTOR FOR ADULT IMAGES , 2003 .

[8]  Seungmin Lee,et al.  Implementation of high performance objectionable video classification system , 2006, 2006 8th International Conference Advanced Communication Technology.

[9]  David A. Forsyth,et al.  Finding Naked People , 1996, ECCV.

[10]  Yongzhao Zhan,et al.  The retrieval of motion event by associations of temporal frequent pattern growth , 2013, Future Gener. Comput. Syst..

[11]  Moncef Gabbouj,et al.  A generic audio classification and segmentation approach for multimedia indexing and retrieval , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Andreas Jakobsson,et al.  Classification of indecent videos by low complexity repetitive motion detection , 2008, 2008 37th IEEE Applied Imagery Pattern Recognition Workshop.

[13]  Lie Lu,et al.  Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval , 2008, IEEE Transactions on Multimedia.

[14]  Lie Lu,et al.  Towards optimal audio "keywords" detection for audio content analysis and discovery , 2006, MM '06.

[15]  Lie Lu,et al.  Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .

[16]  Bo Xu,et al.  Recognition of blue movies by fusion of audio and video , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[17]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Hermann Ney,et al.  Bag-of-visual-words models for adult image classification and filtering , 2008, 2008 19th International Conference on Pattern Recognition.

[19]  Qi Tian,et al.  Periodicity Detection of Local Motion , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[20]  Zhiwu Lu,et al.  Semantic concept annotation based on audio PLSA model , 2009, MM '09.

[21]  Adrian Ulges,et al.  Detecting pornographic video content by combining image features with motion information , 2009, ACM Multimedia.

[22]  Yb Zhang,et al.  A Two-Stage Content-Based Audio Segmentation Algorithm , 2006 .

[23]  Qun Liu,et al.  Fast commercial detection based on audio retrieval , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[24]  Shumeet Baluja,et al.  Large scale image-based adult-content filtering , 2006, VISAPP.