A Fuzzy Ontology and SVM–Based Web Content Classification System

The volume of adult content on the world wide web is increasing rapidly. This makes an automatic detection of adult content a more challenging task, when eliminating access to ill-suited websites. Most pornographic webpage–filtering systems are based on n-gram, naïve Bayes, K-nearest neighbor, and keyword-matching mechanisms, which do not provide perfect extraction of useful data from unstructured web content. These systems have no reasoning capability to intelligently filter web content to classify medical webpages from adult content webpages. In addition, it is easy for children to access pornographic webpages due to the freely available adult content on the Internet. It creates a problem for parents wishing to protect their children from such unsuitable content. To solve these problems, this paper presents a support vector machine (SVM) and fuzzy ontology–based semantic knowledge system to systematically filter web content and to identify and block access to pornography. The proposed system classifies URLs into adult URLs and medical URLs by using a blacklist of censored webpages to provide accuracy and speed. The proposed fuzzy ontology then extracts web content to find website type (adult content, normal, and medical) and block pornographic content. In order to examine the efficiency of the proposed system, fuzzy ontology, and intelligent tools are developed using Protégé 5.1 and Java, respectively. Experimental analysis shows that the performance of the proposed system is efficient for automatically detecting and blocking adult content.

[1]  Zhouyu Fu,et al.  Recognition of Pornographic Web Pages by Classifying Texts and Images , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Alireza Behrad,et al.  Obscene Video Recognition Using Fuzzy SVM and New Sets of Features , 2013 .

[3]  Pablo Castells,et al.  A Multi-Purpose Ontology-Based Approach for Personalized Content Filtering and Retrieval , 2006, 2006 First International Workshop on Semantic Media Adaptation and Personalization (SMAP'06).

[4]  Diego Calvanese,et al.  Ontop: Answering SPARQL queries over relational databases , 2016, Semantic Web.

[5]  Umberto Straccia,et al.  The fuzzy ontology reasoner fuzzyDL , 2016, Knowl. Based Syst..

[6]  Barbara Carminati,et al.  Content-Based Filtering in On-Line Social Networks , 2010, PSDML.

[7]  Qinglin Guo,et al.  A novel approach for multi-agent-based Intelligent Manufacturing System , 2009, Inf. Sci..

[8]  Yuen-Tak Yu,et al.  A web search-centric approach to recommender systems with URLs as minimal user contexts , 2011, J. Syst. Softw..

[9]  Arnaldo de Albuquerque Araújo,et al.  Nude Detection in Video Using Bag-of-Visual-Features , 2009, 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing.

[10]  Zachary Weinberg,et al.  Topics of Controversy: An Empirical Analysis of Web Censorship Lists , 2017, Proc. Priv. Enhancing Technol..

[11]  Zhen-Shu Mi,et al.  An Obstacle Recognizing Mechanism for Autonomous Underwater Vehicles Powered by Fuzzy Domain Ontology and Support Vector Machine , 2014 .

[12]  Tom Pixley Document Object Model (DOM) Level 3 Events Specification , 2000 .

[13]  Eduardo Fidalgo,et al.  Classifying Illegal Activities on Tor Network Based on Web Textual Contents , 2017, EACL.

[14]  José María Gómez Hidalgo,et al.  Named Entity Recognition for Web Content Filtering , 2005, NLDB.

[15]  Heng Ma Fast blocking of undesirable web pages on client PC by discriminating URL using neural networks , 2008, Expert Syst. Appl..

[16]  Tiecheng Song,et al.  A High-Performance URL Lookup Engine for URL Filtering Systems , 2010, 2010 IEEE International Conference on Communications.

[17]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[18]  Farnam Jahanian,et al.  Improving Spam Blacklisting Through Dynamic Thresholding and Speculative Aggregation , 2010, NDSS.

[19]  Mahdi Nikdast,et al.  The Semantic Web: a New Approach for Future World Wide Web , 2009 .

[20]  Ana Kovacevic Cyberbullying detection using web content mining , 2014, 2014 22nd Telecommunications Forum Telfor (TELFOR).

[21]  Nan Chen,et al.  Constrained NMF-based semi-supervised learning for social media spammer detection , 2017, Knowl. Based Syst..

[22]  Jong-Seon No,et al.  Improving Windowed Decoding of SC LDPC Codes by Effective Decoding Termination, Message Reuse, and Amplification , 2018, IEEE Access.

[23]  Shi-Jim Yen,et al.  A type-2 fuzzy personal ontology for meeting scheduling system , 2010, International Conference on Fuzzy Systems.

[24]  nbspJyoti Arora,et al.  A Novel OBIRS System For Ontology Based Information Retrieval System , 2016 .

[25]  Pan Yan,et al.  Ontology-Based Information Content Security Analysis , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[26]  Ali Abbas,et al.  Fuzzy Logic-Based Guaranteed Lifetime Protocol for Real-Time Wireless Sensor Networks , 2015, Sensors.

[27]  Yuan-Cheng Lai,et al.  An Early Decision Algorithm to Accelerate Web Content Filtering , 2006, ICOIN.

[28]  David Sánchez,et al.  Integrated Agent-Based Approach for Ontology-Driven Web Filtering , 2006, KES.

[29]  Raphael Cohen-Almagor,et al.  Online Child Sex Offenders: Challenges and Counter‐Measures , 2013, SSRN Electronic Journal.

[30]  Kyung Sup Kwak,et al.  Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification , 2016, Appl. Soft Comput..

[31]  Paul A. Watters,et al.  Identifying and Blocking Pornographic Content , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[32]  Thuy-An Dinh,et al.  A Model for Automatically Detecting and Blocking Pornographic Websites , 2015, 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE).

[33]  Jie Song,et al.  Content semantic filter based on Domain Ontology , 2010, 2010 IEEE International Conference on Progress in Informatics and Computing.

[34]  Hsinchun Chen,et al.  Analysis of cyberactivism: A case study of online free Tibet activities , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[35]  Daeyoung Park,et al.  Merged Ontology and SVM-Based Information Extraction and Recommendation System for Social Robots , 2017, IEEE Access.

[36]  Jian-hua Li,et al.  Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model , 2004, Journal of Zhejiang University. Science.

[37]  Yong-Gi Kim,et al.  Type-2 fuzzy ontology-based opinion mining and information extraction: A proposal to automate the hotel reservation system , 2015, Applied Intelligence.

[38]  Duncan Dubugras Alcoba Ruiz,et al.  Ontology-Based Filtering Mechanisms for Web Usage Patterns Retrieval , 2005, EC-Web.

[39]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[40]  Nen-Fu Huang,et al.  An Efficient Caching Mechanism for Network-Based URL Filtering by Multi-Level Counting Bloom Filters , 2011, 2011 IEEE International Conference on Communications (ICC).

[41]  Adil M. Bagirov,et al.  Optimization Based Clustering Algorithms for Authorship Analysis of Phishing Emails , 2017, Neural Processing Letters.

[42]  Lung-Hao Lee,et al.  Collaborative cyberporn filtering with collective intelligence , 2011, SIGIR.

[43]  Huicheng Zheng,et al.  Blocking objectionable images: adult images and harmful symbols , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[44]  Rafael Corchuelo,et al.  A statistical approach to URL-based web page clustering , 2012, WWW.

[45]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[46]  Mohammed Elmogy,et al.  A fuzzy-ontology-oriented case-based reasoning framework for semantic diabetes diagnosis , 2015, Artif. Intell. Medicine.

[47]  Maurizio Lenzerini,et al.  Inconsistency-tolerant query answering in ontology-based data access , 2015, J. Web Semant..

[48]  Franco Salvetti,et al.  Weblog Classification for Fast Splog Filtering: A URL Language Model Segmentation Approach , 2006, NAACL.

[49]  Rodrigo C. Barros,et al.  Adult content detection in videos with convolutional and recurrent neural networks , 2018, Neurocomputing.

[50]  Jing Zhang,et al.  An Improved Ontology-Based Web Information Extraction , 2015, 2015 International Conference of Educational Innovation through Technology (EITT).

[51]  V.F. Fernandez,et al.  Naive Bayes Web Page Classification with HTML Mark-Up Enrichment , 2006, 2006 International Multi-Conference on Computing in the Global Information Technology - (ICCGI'06).

[52]  Siu Cheung Hui,et al.  XFighter: an intelligent web content filtering system , 2009, Kybernetes.

[53]  Jyh-Jian Sheu Distinguishing Medical Web Pages from Pornographic Ones: An Efficient Pornography Websites Filtering Method , 2017, Int. J. Netw. Secur..

[54]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[55]  Haibo Wang,et al.  Applying a Novel Combined Classifier for Hypertext Classification in Pornographic Web Filtering , 2008, 2008 International Conference on Internet Computing in Science and Engineering.

[56]  John Soldatos,et al.  A Formally Specified Ontology Management API as a Registry for Ubiquitous Computing Systems , 2006, AIAI.

[57]  Dimitrios Buhalis,et al.  Content mining framework in social media: A FIFA world cup 2014 case analysis , 2017, Inf. Manag..

[58]  Kyung Sup Kwak,et al.  Fuzzy Ontology-Based Sentiment Analysis of Transportation and City Feature Reviews for Safe Traveling , 2017, ArXiv.

[59]  Christian Rossow,et al.  Empirical research of IP blacklists , 2008, ISSE.

[60]  Ki-Il Kim,et al.  A fuzzy logic scheme for real-time routing in wireless sensor networks , 2015, 2015 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[61]  Monika Henzinger,et al.  Purely URL-based topic classification , 2009, WWW '09.

[62]  Guixian Xu,et al.  Research on Tibetan hot words, sensitive words tracking and public opinion classification , 2017, Cluster Computing.