Filtering adult image content with topic models

Protecting children from exposure to adult content has become a serious problem in the real world. Current statistics show that, for instance, the average age of first Internet exposure to pornography is 11 years, that the largest consumer group of Internet pornography is the age group of 12-to-17- year-olds and that 90% of the 8-to-16-year-olds have viewed porn online. To protect our children, effective algorithms for detecting adult images are needed. In this research we evaluate the use of probabilistic Latent Semantic Analysis (pLSA) for this task. We will show that topic models based on pLSA can detect adult content with a correct positive rate of 92.7%, while only showing off a false positive rate of 1.9%. Even when using grayscale images only, a correct positive rate of 90.8% at a false positive rate of 2% can be achieved.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  David A. Forsyth,et al.  Finding Naked People , 1996, ECCV.

[3]  Wen Gao,et al.  Adult Image Detection Method Base-on Skin Color Model and Support Vector Machine , 2001 .

[4]  Liming Chen,et al.  WebGuard: Web based adult content detection and filtering system , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[5]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[6]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[8]  Youngsoo Kim,et al.  An efficient text filter for adult Web documents , 2006, 2006 8th International Conference Advanced Communication Technology.

[9]  Rainer Lienhart,et al.  Image retrieval on large-scale image databases , 2007, CIVR '07.

[10]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Rainer Lienhart,et al.  PLSA on Large Scale Image Databases , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[12]  Hermann Ney,et al.  Bag-of-visual-words models for adult image classification and filtering , 2008, 2008 19th International Conference on Pattern Recognition.

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .