Bootstrapping deep feature hierarchy for pornographic image recognition

Automatically recognizing pornographic images from the Web is a vital step to purify Internet environment. Inspired by the rapid developments of deep learning models, we present a deep architecture of convolutional neural network (CNN) for high accuracy pornographic image recognition. The proposed architecture is built upon existing CNNs which accepts input images of different sizes and incorporates features from different hierarchy to perform prediction. To effectively train the model, we propose a two-stage training strategy to learn the model parameters from scratch and end-to-end. During the training procedure, we also employ a hard negative sampling strategy to further reduce the false positive rate of the model. Experimental results on a large dataset demonstrate good performance of the proposed model and the effectiveness of our training strategies, with a considerable improvement over some traditional methods using hand-crafted features and deep learning method using mainstream CNN architecture.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Zhouyu Fu,et al.  Recognition of Pornographic Web Pages by Classifying Texts and Images , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[4]  Weiming Hu,et al.  Patch-based skin color detection and its application to pornography image filtering , 2010, WWW '10.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[8]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[12]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  Rudolf Hauke,et al.  Filtering adult image content with topic models , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[14]  Rainer Lienhart,et al.  A survey on visual adult image recognition , 2012, Multimedia Tools and Applications.

[15]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Hermann Ney,et al.  Bag-of-visual-words models for adult image classification and filtering , 2008, 2008 19th International Conference on Pattern Recognition.

[17]  Matthieu Cord,et al.  Pooling in image representation: The visual codeword point of view , 2013, Comput. Vis. Image Underst..

[18]  Wen Gao,et al.  Adult Image Detection Method Base-on Skin Color Model and Support Vector Machine , 2001 .

[19]  X RiesChristian,et al.  A survey on visual adult image recognition , 2014 .

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  David A. Forsyth,et al.  Finding Naked People , 1996, ECCV.