An Approach to Creating an Intelligent System for Detecting and Countering Inappropriate Information on the Internet

Currently, the Internet is becoming one of the most dangerous threats to personal, public and state information security. Therefore, the task of detecting and counteracting inappropriate information in digital network content becomes of national importance. The paper offers a new approach to creating an intelligent system for detecting and counteracting inappropriate information on the Internet based on the use of machine learning methods and processing of big data and describes the architecture of such a system. Experimental evaluation of one of the most important system components, which is the component of multidimensional evaluation and categorization of information objects in single-threaded and multi-threaded modes showed high efficiency of using various classifiers included in the Python Scikit-learn and Spark MLlib libraries to solve the problem.

[1]  Charu C. Aggarwal,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[2]  H. Roberts,et al.  Network Propaganda: Manipulation, Disinformation, and Radicalization in American Politics , 2018 .

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[5]  Shanshan Zhang,et al.  A Survey on Information Diffusion in Online Social Networks: Models and Methods , 2017, Inf..

[6]  Thanh Tran,et al.  Uncovering Fake Likers in Online Social Networks , 2016, CIKM.

[7]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[8]  Dr. Charu C. Aggarwal Machine Learning for Text , 2018, Springer International Publishing.

[9]  Panagiotis Takis Metaxas,et al.  The Fake News Spreading Plague: Was it Preventable? , 2017, WebSci.

[10]  Igor V. Kotenko,et al.  Improving the Categorization of Web Sites by Analysis of Html-Tags Statistics to Block Inappropriate Content , 2015, IDC.

[11]  Youssef Iraqi,et al.  Enhancing Phishing E-Mail Classifiers: A Lexical URL Analysis Approach , 2013 .

[12]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[13]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[14]  Igor V. Kotenko,et al.  Parallel big data processing system for security monitoring in Internet of Things networks , 2017, J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl..

[15]  Olga Tushkanova,et al.  Comparative Analysis of the Numerical Measures for Mining Associative and Causal Relationships in Big Data , 2015 .

[16]  Igor V. Kotenko,et al.  Categorisation of web pages for protection against inappropriate content in the internet , 2017, Int. J. Internet Protoc. Technol..

[17]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[18]  Muhammad Nihal Hussain,et al.  Leveraging Social Network Analysis and Cyber Forensics Approaches to Study Cyber Propaganda Campaigns , 2018, Lecture Notes in Social Networks.

[19]  Sebastian Tschiatschek,et al.  Fake News Detection in Social Networks via Crowd Signals , 2017, WWW.

[20]  Adrian Iftene,et al.  Identifying Fake News and Fake Users on Twitter , 2018, KES.