Exploiting the Spam Correlations in Scalable Online Social Spam Detection

The huge amount of social spam from large-scale social networks has been a common phenomenon in the contemporary world. The majority of former research focused on improving the efficiency of identifying social spam from a limited size of data in the algorithm side, however, few of them target on the data correlations among large-scale distributed social spam and utilize the benefits from the system side. In this paper, we propose a new scalable system, named SpamHunter, which can utilize the spam correlations from distributed data sources to enhance the performance of large-scale social spam detection. It identifies the correlated social spam from various distributed servers/sources through DHT-based hierarchical functional trees. These functional trees act as bridges among data servers/sources to aggregate, exchange, and communicate the updated and newly emerging social spam with each other. Furthermore, by processing the online social logs instantly, it allows online streaming data to be processed in a distributed manner, which reduces the online detection latency and avoids the inefficiency of outdated spam posts. Our experimental results with real-world social logs demonstrate that SpamHunter reaches 95% F1 score in the spam detection, achieves high efficiency in scaling to a large amount of data servers with low latency.

[1]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, CCS '10.

[2]  Yuzhe Tang,et al.  Oases: An Online Scalable Spam Detection System for Social Networks , 2018, 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).

[3]  Pang-Ning Tan,et al.  Detecting hashtag hijacking from Twitter , 2016, WebSci.

[4]  Adam D. I. Kramer,et al.  Detecting Emotional Contagion in Massive Social Networks , 2014, PloS one.

[5]  Aixin Sun,et al.  Effect of Spam on Hashtag Recommendation for Tweets , 2016, WWW.

[6]  Calton Pu,et al.  BEAN: A BEhavior ANalysis Approach of URL Spam Filtering in Twitter , 2015, 2015 IEEE International Conference on Information Reuse and Integration.

[7]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[8]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Satoshi Matsuoka,et al.  Evaluation of HPC-Big Data Applications Using Cloud Platforms , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[10]  Torsten Hoefler,et al.  Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[12]  Kristina Lerman,et al.  The Simple Rules of Social Contagion , 2013, Scientific Reports.

[13]  Ben Y. Zhao,et al.  Understanding latent interactions in online social networks , 2010, IMC '10.

[14]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[15]  Dimitrios Tsoumakos,et al.  A network approach for managing and processing big cancer data in clouds , 2015, Cluster Computing.

[16]  Sushil Jajodia,et al.  Profiling Online Social Behaviors for Compromised Account Detection , 2016, IEEE Transactions on Information Forensics and Security.

[17]  Weiqing Sun,et al.  Efficient spam detection across Online Social Networks , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[18]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[19]  Miriam Leeser,et al.  Performance prediction techniques for scalable large data processing in distributed MPI systems , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[20]  Adriana Iamnitchi,et al.  A Survey on Privacy and Security in Online Social Networks , 2015, Online Soc. Networks Media.

[21]  Krishna P. Gummadi,et al.  Towards Detecting Anomalous User Behavior in Online Social Networks , 2014, USENIX Security Symposium.

[22]  Yu Wang,et al.  Statistical Features-Based Real-Time Detection of Drifted Twitter Spam , 2017, IEEE Transactions on Information Forensics and Security.

[23]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[24]  Liting Hu,et al.  Harnessing the Nature of Spam in Scalable Online Social Spam Detection , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[25]  Reza Farahbakhsh,et al.  NetSpam: A Network-Based Spam Detection Framework for Reviews in Online Social Media , 2017, IEEE Transactions on Information Forensics and Security.

[26]  Yang Liu,et al.  Who Influenced You? Predicting Retweet via Social Influence Locality , 2015, ACM Trans. Knowl. Discov. Data.

[27]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[28]  Dana Petcu,et al.  Distributed Platforms and Cloud Services: Enabling Machine Learning for Big Data , 2016 .

[29]  Alok N. Choudhary,et al.  Towards Online Spam Filtering in Social Networks , 2012, NDSS.