Oases: An Online Scalable Spam Detection System for Social Networks

Web-based social networks enable new community-based opportunities for participants to engage, share their thoughts, and interact with each other. Theses related activities such as searching and advertising are threatened by spammers, content polluters, and malware disseminators. We propose a scalable spam detection system, termed Oases, for uncovering social spam in social networks using an online and scalable approach. The novelty of our design lies in two key components: (1) a decentralized DHT-based tree overlay deployment for harvesting and uncovering deceptive spam from social communities; and (2) a progressive aggregation tree for aggregating the properties of these spam posts for creating new spam classifiers to actively filter out new spam. We design and implement the prototype of Oases and discuss the design considerations of the proposed approach. Our large-scale experiments using real-world Twitter data demonstrate scalability, attractive load-balancing, and graceful efficiency in online spam detection for social networks.

[1]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[2]  Douglas Thain,et al.  Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[3]  Weiqing Sun,et al.  Efficient spam detection across Online Social Networks , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[4]  Arkaitz Zubiaga,et al.  Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter , 2015, #MSM.

[5]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[6]  Calton Pu,et al.  SPADE: a social-spam analytics and detection framework , 2014, Social Network Analysis and Mining.

[7]  Maozhen Li,et al.  A MapReduce based parallel SVM for large scale spam filtering , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[8]  Yu Wang,et al.  Statistical Features-Based Real-Time Detection of Drifted Twitter Spam , 2017, IEEE Transactions on Information Forensics and Security.

[9]  Miguel Castro,et al.  Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..

[10]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[11]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[12]  Jian Peng,et al.  2016 Ieee International Conference on Big Data (big Data) Exploiting Temporal Divergence of Topic Distributions for Event Detection , 2022 .

[13]  Pang-Ning Tan,et al.  Detecting hashtag hijacking from Twitter , 2016, WebSci.

[14]  Qiaozhu Mei,et al.  Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts , 2015, WWW.

[15]  Satoshi Matsuoka,et al.  Evaluation of HPC-Big Data Applications Using Cloud Platforms , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[16]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[17]  Hrishikesh Amur,et al.  ELF: Efficient Lightweight Fast Stream Processing at Scale , 2014, USENIX Annual Technical Conference.

[18]  Shaik. AshaBee,et al.  Towards Online Spam Filtering In Social Networks , 2017 .

[19]  Aixin Sun,et al.  Effect of Spam on Hashtag Recommendation for Tweets , 2016, WWW.

[20]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[21]  Arkaitz Zubiaga,et al.  Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads , 2015, PloS one.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Krishna P. Gummadi,et al.  Towards Detecting Anomalous User Behavior in Online Social Networks , 2014, USENIX Security Symposium.

[24]  Vern Paxson,et al.  Consequences of Connectivity: Characterizing Account Hijacking on Twitter , 2014, CCS.

[25]  Miriam Leeser,et al.  Performance prediction techniques for scalable large data processing in distributed MPI systems , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).