Method for Pornography Filtering in the WEB Based on Automatic Classification and Natural Language Processing

The paper presents a method for pornography detection in the web pages based on natural language processing. The described classification method uses feature set of single words and groups of words. Syntax analysis is performed to extract collocations. A modification of TF-IDF is used to weight terms. An evaluation and comparison of quality and performance of classification are given.

[1]  Byeong Ho Kang,et al.  Dynamic Web content filtering based on user's knowledge , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[2]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[3]  Liming Chen,et al.  WebGuard: Web based adult content detection and filtering system , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[4]  Paul A. Watters,et al.  Statistical and structural approaches to filtering Internet pornography , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[5]  Zhouyu Fu,et al.  Recognition of Pornographic Web Pages by Classifying Texts and Images , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Huimin Zhao,et al.  A text mining approach to Internet abuse detection , 2008, Inf. Syst. E Bus. Manag..

[7]  Ivan Smirnov,et al.  Relational–situational method for intelligent search and analysis of scientific publications , 2013 .

[8]  Jantima Polpinij,et al.  Content-Based Text Classifiers for Pornographic Web Filtering , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[9]  Reihaneh Safavi-Naini,et al.  Web filtering using text classification , 2003, The 11th IEEE International Conference on Networks, 2003. ICON2003..

[10]  Jantima Polpinij,et al.  A web pornography patrol system by content-based analysis: In particular text and image , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[11]  Paul Douglas,et al.  International Conference on Information Technology : Coding and Computing , 2003 .

[12]  Jian-hua Li,et al.  Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model , 2004, Journal of Zhejiang University. Science.

[13]  Siu Cheung Hui,et al.  A structural and content-based analysis for Web filtering , 2003, Internet Res..