Pornography Detection with the Wisdom of Crowds

With rapid development of the Internet, much attention has been paid to the problem of children exposed to Internet pornography. Existing detection techniques, which mainly focus on pornography content analysis have obtained much success. However, they still meet challenges in practical Web environment due to the great computational costs and the difficulties in dealing with various pornography forms. We attempt to solve this problem from a new perspective with the wisdom of crowds in search engine click-through logs. Inspired by the idea that different pornography Web pages may be oriented by similar search keywords, a label propagation method on click-through bipartite graph is proposed which can locate pornography Web pages from a small set (a few hundreds) of manually labeled seed pages. Experiments performed on datasets collected from both English and Chinese search engines show that the proposed algorithm can identify different forms of Internet pornography both effectively and efficiently.

[1]  S. C. Hui,et al.  An intelligent categorization engine for bilingual web content filtering , 2005, IEEE Transactions on Multimedia.

[2]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[3]  Paul A. Watters,et al.  Statistical and structural approaches to filtering Internet pornography , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[4]  Michele L. Ybarra,et al.  Exposure to Internet Pornography among Children and Adolescents: A National Survey , 2005, Cyberpsychology Behav. Soc. Netw..

[5]  Yiqun Liu,et al.  Fighting against web spam: a novel propagation method based on click-through data , 2012, SIGIR '12.

[6]  Jantima Polpinij,et al.  A web pornography patrol system by content-based analysis: In particular text and image , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[7]  Lung-Hao Lee,et al.  Generation of pornographic blacklist and its incremental update using an inverse chi-square based method , 2008, Inf. Process. Manag..

[8]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[9]  Paul Resnick,et al.  PICS: Internet access controls without censorship , 1996, CACM.

[10]  K. Subrahmanyam,et al.  Youth Internet use: risks and opportunities , 2009, Current opinion in psychiatry.

[11]  S. C. Hui,et al.  Neural Networks for Web Content Filtering , 2002, IEEE Intell. Syst..

[12]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[13]  Reihaneh Safavi-Naini,et al.  Web filtering using text classification , 2003, The 11th IEEE International Conference on Networks, 2003. ICON2003..

[14]  Jian-hua Li,et al.  Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model , 2004, Journal of Zhejiang University. Science.