Social Network Polluting Contents Detection through Deep Learning Techniques

Nowadays social networks are widespread used not only to enable users to share comments with other users but also as tool from which is possible to extract knowledge. As a matter of fact, social networks are increasingly considered to understand the opinion trend about a politician or related to a certain event that occurred: in general social networks have been proved useful to understand the public opinion from both governments and companies. In addition, also from the end users point of view it is difficult to identify real contents. This is the reason why in last years we are witnessing a growing interest in tools for analyzing big data gathered from social networks in order to find common opinions. In this context, content polluters on social networks make the opinion mining process difficult to browse valuable contents. In this paper we propose a method aimed to discriminate between pollute and real information from a semantic point of view. We exploit a combination of word embedding and deep learning techniques to categorize semantic similarities between (pollute and real) linguistic sentences. We experiment the proposed method on a dataset composed of real-world sentences gathered from the Twitter social network obtaining interesting results in terms of precision and recall.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Haiying Shen,et al.  SOAP: A Social network Aided Personalized and effective spam filter to clean your e-mail box , 2011, 2011 Proceedings IEEE INFOCOM.

[3]  Hiroyuki Kitagawa,et al.  TURank: Twitter User Ranking Based on User-Tweet Graph Analysis , 2010, WISE.

[4]  Dong Seong Kim,et al.  Spam Detection Using Feature Selection and Parameters Optimization , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.

[5]  Arnold B. Bakker,et al.  A model of positive and negative learning: learning demands and resources, learning engagement, critical thinking, and fake news detection , 2018 .

[6]  Antonella Santone,et al.  Car hacking identification through fuzzy logic algorithms , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[7]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Antonella Santone,et al.  Identification of Android Malware Families with Model Checking , 2016, ICISSP.

[9]  Markus Jakobsson,et al.  Social phishing , 2007, CACM.

[10]  Jacob Ratkiewicz,et al.  Detecting and Tracking Political Abuse in Social Media , 2011, ICWSM.

[11]  M. Peters,et al.  Post-Truth, Fake News: Viral Modernity & Higher Education , 2018 .

[12]  Xianchao Zhang,et al.  Detecting Spam and Promoting Campaigns in the Twitter Social Network , 2012, 2012 IEEE 12th International Conference on Data Mining.

[13]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[14]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[15]  Christoph Meinel,et al.  Telling experts from spammers: expertise ranking in folksonomies , 2009, SIGIR.

[16]  Nauman Aslam,et al.  Detection of online phishing email using dynamic evolving neural network based on reinforcement learning , 2018, Decis. Support Syst..

[17]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[18]  Gianluca Stringhini,et al.  POISED: Spotting Twitter Spam Off the Beaten Paths , 2017, CCS.

[19]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[20]  Anil R. Doshi,et al.  How the Supply of Fake News Affected Consumer Behavior during the 2016 US Election , 2018 .

[21]  Gerardo Canfora,et al.  Composition-Malware: Building Android Malware at Run Time , 2015, 2015 10th International Conference on Availability, Reliability and Security.

[22]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[23]  Ilaria Renna,et al.  Is this the Era of Misinformation yet: Combining Social Bots and Fake News to Deceive the Masses , 2018, WWW.

[24]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[25]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[26]  Huan Liu,et al.  Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate , 2018, WSDM.

[27]  Miriam J. Metzger,et al.  The science of fake news , 2018, Science.

[28]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[29]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[30]  Tom M. Mitchell,et al.  Machine Learning and Data Mining , 2012 .

[31]  Marina Azzimonti,et al.  Social Media Networks, Fake News, and Polarization , 2018, European Journal of Political Economy.

[32]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[33]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[34]  Giovanni Luca Ciampaglia,et al.  The spread of low-credibility content by social bots , 2017, Nature Communications.

[35]  Gerardo Canfora,et al.  Metamorphic Malware Detection Using Code Metrics , 2014, Inf. Secur. J. A Glob. Perspect..

[36]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[37]  Xinwen Fu,et al.  Analysis of and defense against crowd-retweeting based spam in social networks , 2018, World Wide Web.

[38]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[39]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[40]  Fabio Martinelli,et al.  R-PackDroid: API package-based characterization and detection of mobile ransomware , 2017, SAC.

[41]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[42]  Fabio Martinelli,et al.  Evaluating Convolutional Neural Network for Effective Mobile Malware Detection , 2017, KES.

[43]  Abiodun Modupe,et al.  Feature selection and support vector machine hyper-parameter optimisation for spam detection , 2016, 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech).

[44]  Aniello Cimitile,et al.  Evaluating model checking for cyber threats code obfuscation identification , 2018, J. Parallel Distributed Comput..

[45]  A.P.J. van den Bosch,et al.  Using language modeling for spam detection in social reference manager websites , 2009 .

[46]  Mohammad Karim Sohrabi,et al.  A Feature Selection Approach to Detect Spam in the Facebook Social Network , 2018 .

[47]  Antonella Santone,et al.  Infer Gene Regulatory Networks from Time Series Data with Probabilistic Model Checking , 2015, 2015 IEEE/ACM 3rd FME Workshop on Formal Methods in Software Engineering.

[48]  Aniello Cimitile,et al.  Mobile Malware Detection in the Real World , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[49]  Arun Kumar Sangaiah,et al.  Human behavior characterization for driving style recognition in vehicle system , 2020, Comput. Electr. Eng..

[50]  Antonella Santone,et al.  Hey Malware, I Can Find You! , 2016, 2016 IEEE 25th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE).

[51]  Sreenivas Gollapudi,et al.  Ranking mechanisms in twitter-like forums , 2010, WSDM '10.

[52]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[53]  Gerardo Canfora,et al.  LEILA: Formal Tool for Identifying Mobile Malicious Behaviour , 2019, IEEE Transactions on Software Engineering.

[54]  Antonella Santone,et al.  De novo reconstruction of gene regulatory networks from time series data, an approach based on formal methods. , 2014, Methods.

[55]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[56]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[57]  Junzo Watada,et al.  Multi-objective unit commitment under hybrid uncertainties: A data-driven approach , 2018, 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC).

[58]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[59]  Antonella Santone,et al.  Diabetes Mellitus Affected Patients Classification and Diagnosis through Machine Learning Techniques , 2017, KES.