Acoustic Pornography Recognition Using Recurrent Neural Network

This research proposes a pornography recognition model using audio features utilizing recurrent neural network. The proposed work is totally different from most of the previous attempts on pornography recognition using visual contents of the nudity images or videos. The importance of using sounds for pornography recognition arises in the cases in which the visual features are not adequately informative of the nudity contents such as dark scenes, and scenes with the covered bodies. Acoustic recognition model selects the finest signal representation by feature extraction of Mel-Frequency Cepstrum Coefficients (MFCCs) after been pre-processed for noise reduction. Extracted features are fed into a network with long short-term memory (LSTM) cells. The LSTM cells have the capability to solve problems associated with temporal dependencies and require learning long-term and solve the vanishing gradient problems associated with Recurrent Neural Network (RNN). A dataset of porno-sound samples has been created based on an existing pornography video dataset. 800 data samples were used including positive and negative samples. The experimental results confirm the feasibility of the proposed acoustic-driven approach by demonstrating an accuracy of 86.50%, and F-score of 86.83%, in the task of pornography recognition.

[1]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[3]  Septian Cahyadi,et al.  Adult content classification through deep convolution neural network , 2017, 2017 International Conference on Computer, Control, Informatics and its Applications (IC3INA).

[4]  Vanessa Testoni,et al.  Pornography classification: The hidden clues in video space-time. , 2016, Forensic science international.

[5]  Xin Jin,et al.  Pornographic Image Recognition via Weighted Multiple Instance Learning , 2019, IEEE Transactions on Cybernetics.

[6]  Bo Xu,et al.  Recognition of blue movies by fusion of audio and video , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[7]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Michael Rovatsos Events , 1952, Journal of Failure Analysis and Prevention.

[10]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Hedvig Kjellström,et al.  Audio-visual classification and detection of human manipulation actions , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Rached Tourki,et al.  Human detection based on integral Histograms of Oriented Gradients and SVM , 2011, 2011 International Conference on Communications, Computing and Control Applications (CCCA).

[13]  Magnus Stenman,et al.  Automatic speech recognition An evaluation of Google Speech , 2015 .

[14]  Ronald W. Schafer,et al.  Theory and Applications of Digital Speech Processing , 2010 .

[15]  Arnaldo de Albuquerque Araújo,et al.  Nude Detection in Video Using Bag-of-Visual-Features , 2009, 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing.

[16]  Saifur Rahman,et al.  SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[17]  Li Deng,et al.  Ensemble deep learning for speech recognition , 2014, INTERSPEECH.

[18]  Md Saiful Islam,et al.  Bengali speech recognition: A double layered LSTM-RNN approach , 2017, 2017 20th International Conference of Computer and Information Technology (ICCIT).

[19]  Rodrigo C. Barros,et al.  Seamless Nudity Censorship: an Image-to-Image Translation Approach based on Adversarial Training , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[20]  Li Zhuo,et al.  Convolutional Neural Networks Based Pornographic Image Classification , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).