Adult Image and Video Recognition by a Deep Multicontext Network and Fine-to-Coarse Strategy

Adult image and video recognition is an important and challenging problem in the real world. Low-level feature cues do not produce good enough information, especially when the dataset is very large and has various data distributions. This issue raises a serious problem for conventional approaches. In this article, we tackle this problem by proposing a deep multicontext network with fine-to-coarse strategy for adult image and video recognition. We employ a deep convolution networks to model fusion features of sensitive objects in images. Global contexts and local contexts are both taken into consideration and are jointly modeled in a unified multicontext deep learning framework. To make the model more discriminative for diverse target objects, we investigate a novel hierarchical method, and a task-specific fine-to-coarse strategy is designed to make the multicontext modeling more suitable for adult object recognition. Furthermore, some recently proposed deep models are investigated. Our approach is extensively evaluated on four different datasets. One dataset is used for ablation experiments, whereas others are used for generalization experiments. Results show significant and consistent improvements over the state-of-the-art methods.

[1]  Jordi Gonzàlez,et al.  A coarse-to-fine approach for fast deformable object detection , 2011, CVPR 2011.

[2]  Hermann Ney,et al.  Bag-of-visual-words models for adult image classification and filtering , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Xingming Sun,et al.  Fast Motion Estimation Based on Content Property for Low-Complexity H.265/HEVC Encoder , 2016, IEEE Transactions on Broadcasting.

[4]  Sai Ji,et al.  Towards efficient content-aware search over encrypted outsourced data in cloud , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[5]  David A. Forsyth,et al.  Finding people by sampling , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Abdulmotaleb El-Saddik,et al.  A Combined Approach Toward Consistent Reconstructions of Indoor Spaces Based on 6D RGB-D Odometry and KinectFusion , 2015, ACM Trans. Intell. Syst. Technol..

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[9]  Matthieu Cord,et al.  Pooling in image representation: The visual codeword point of view , 2013, Comput. Vis. Image Underst..

[10]  Xingming Sun,et al.  Effective and Efficient Global Context Verification for Image Copy Detection , 2017, IEEE Transactions on Information Forensics and Security.

[11]  Changsheng Xu,et al.  Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Arnaldo de Albuquerque Araújo,et al.  Pornography detection using BossaNova video descriptor , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[13]  David A. Forsyth,et al.  Learning to Find Pictures of People , 1998, NIPS.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Zhihua Xia,et al.  A Privacy-Preserving and Copy-Deterrence Content-Based Image Retrieval Scheme in Cloud Computing , 2016, IEEE Transactions on Information Forensics and Security.

[16]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[17]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Ao Tang,et al.  A Real-Time Hand Posture Recognition System Using Deep Neural Networks , 2015, ACM Trans. Intell. Syst. Technol..

[19]  Jung-Jae Yu,et al.  Skin detection for adult image identification , 2014, 16th International Conference on Advanced Communication Technology.

[20]  Hanqing Lu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[22]  Lihua Ye,et al.  Big Skin Regions Detection for Adult Image Identification , 2011, 2011 Workshop on Digital Media and Digital Content Management.

[23]  Juha Karhunen,et al.  Arbitrary Category Classification of Websites Based on Image Content , 2015, IEEE Computational Intelligence Magazine.

[24]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[25]  Bjørn Olstad,et al.  Classifying offensive sites based on image content , 2004, Comput. Vis. Image Underst..

[26]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[27]  Shuicheng Yan,et al.  Fashion Parsing With Weak Color-Category Labels , 2014, IEEE Transactions on Multimedia.

[28]  Yuhui Zheng,et al.  Image segmentation by generalized hierarchical fuzzy C-means algorithm , 2015, J. Intell. Fuzzy Syst..

[29]  Yuan Yan Tang,et al.  High-Order Distance-Based Multiview Stochastic Learning in Image Classification , 2014, IEEE Transactions on Cybernetics.

[30]  Jerome Alan Cohen,et al.  Criminal Law of the People's Republic of China , 2012 .

[31]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[32]  Kimiaki Shirahama,et al.  Object matching with hierarchical skeletons , 2016, Pattern Recognit..

[33]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Xingming Sun,et al.  Segmentation-Based Image Copy-Move Forgery Detection Scheme , 2015, IEEE Transactions on Information Forensics and Security.

[35]  Li Guo,et al.  An adult image detection algorithm based on Bag-of-Visual-Words and text information , 2014, 2014 10th International Conference on Natural Computation (ICNC).

[36]  Huicheng Zheng,et al.  Blocking objectionable images: adult images and harmful symbols , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[37]  Nasir D. Memon,et al.  Towards automatic detection of child pornography , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[38]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[40]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[43]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[45]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[46]  Bo Du,et al.  Saliency-Guided Unsupervised Feature Learning for Scene Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[47]  Jen-Hao Hsiao,et al.  Deep learning of binary hash codes for fast image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[49]  Jianping Fan,et al.  iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning , 2017, IEEE Transactions on Information Forensics and Security.

[50]  David A. Forsyth,et al.  Finding Naked People , 1996, ECCV.

[51]  Wen Gao,et al.  Detecting adult image using multiple features , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[52]  Li Zhuo,et al.  Incremental Learning for Compressed Pornographic Image Recognition , 2015, 2015 IEEE International Conference on Multimedia Big Data.

[53]  Xiaochun Cao,et al.  Makeup Like a Superstar: Deep Localized Makeup Transfer Network , 2016, IJCAI.

[54]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[56]  Gao Wen Detecting Pornographic Images with Visual Words , 2008 .

[57]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[58]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[59]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[60]  Shumeet Baluja,et al.  Large scale image-based adult-content filtering , 2006, VISAPP.

[61]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Anni Cai,et al.  Combining multiple SVM classifiers for adult image recognition , 2010, 2010 2nd IEEE InternationalConference on Network Infrastructure and Digital Content.

[63]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[64]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[65]  Xiaoyan Li,et al.  Video Face Editing Using Temporal-Spatial-Smooth Warping , 2014, ACM Trans. Intell. Syst. Technol..

[66]  Bo Du,et al.  Weakly Supervised Learning Based on Coupled Convolutional Neural Networks for Aircraft Detection , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[67]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[68]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[69]  James Ze Wang,et al.  System for screening objectionable images , 1998, Comput. Commun..

[70]  David A. Forsyth,et al.  Automatic Detection of Human Nudes , 1999, International Journal of Computer Vision.

[71]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[73]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Mohamed Moustafa,et al.  Applying deep learning to classify pornographic images and videos , 2015, ArXiv.

[75]  David A. Forsyth,et al.  Finding objects by grouping primitives , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[76]  Bo Du,et al.  Scene Classification via a Gradient Boosting Random Convolutional Network Framework , 2016, IEEE Transactions on Geoscience and Remote Sensing.