Pseudo-Honeypot: Toward Efficient and Scalable Spam Sniffer

Honeypot-based spammer gathering solutions usually lack attribute variability, deployment flexibility, and network scalability, deemed as their common drawbacks. This paper explores pseudo-honeypot, a novel honeypot-like system to overcome such drawbacks, for efficient and scalable spammer sniffing. The pseudo-honeypot takes advantage of user diversity and selects normal accounts, with attributes that have the higher potential of attracting spammers, as the parasitic bodies. By harnessing such category of users, pseudo-honeypot can monitor their streaming posts and behavioral patterns transparently. When compared with its traditional honeypot counterpart, the proposed solution offers the substantial advantages of attribute variability, deployment flexibility, network scalability, and system portability. Meanwhile, it offers a novel method to collect the social network dataset that has a higher probability of including spams and spammers, without being noticed by advanced spammers. We take the Twitter social network as an example to exhibit its system design, including pseudo-honeypot nodes selection, monitoring, feature extraction, ground truth labeling, and learning-based classification. Through experiments, we demonstrate the efficiency of pseudo-honeypot in terms of spams and spammers gathering. In particular, we confirm our solution can garner spammers at least 19 times faster than the state-of-the-art honeypot-based counterpart.

[1]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[2]  Konstantin Beznosov,et al.  Integro: Leveraging Victim Prediction for Robust Fake Account Detection in OSNs , 2015, NDSS.

[3]  Jong Kim,et al.  WarningBird: Detecting Suspicious URLs in Twitter Stream , 2012, NDSS.

[4]  Aixin Sun,et al.  HSpam14: A Collection of 14 Million Tweets for Hashtag-Oriented Spam Research , 2015, SIGIR.

[5]  Tobias Scheffer,et al.  Learning to identify concise regular expressions that describe email campaigns , 2015, J. Mach. Learn. Res..

[6]  Jinyuan Jia,et al.  Random Walk Based Fake Account Detection in Online Social Networks , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[9]  Younès El Bouzekri El Idrissi,et al.  A security approach for social networks based on honeypots , 2016, 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt).

[10]  Shivani Goel,et al.  Spammer Classification Using Ensemble Methods over Content-Based Features , 2016, SocProS.

[11]  Gianluca Stringhini,et al.  COMPA: Detecting Compromised Accounts on Social Networks , 2013, NDSS.

[12]  Chiew Tong Lau,et al.  A study on real-time low-quality content detection on Twitter from the users’ perspective , 2017, PloS one.

[13]  Michael Sirivianos,et al.  Aiding the Detection of Fake Accounts in Large Scale Social Online Services , 2012, NSDI.

[14]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[15]  Ping Li,et al.  In Defense of Minhash over Simhash , 2014, AISTATS.

[16]  Xiao Chen,et al.  6 million spam tweets: A large ground truth for timely Twitter spam detection , 2015, 2015 IEEE International Conference on Communications (ICC).

[17]  Gang Wang,et al.  Northeastern University , 2021, IEEE Pulse.

[18]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[19]  Wei Hu,et al.  Twitter spammer detection using data stream clustering , 2014, Inf. Sci..

[20]  Jeanna Neefe Matthews,et al.  Fake Twitter accounts: profile characteristics obtained using an activity-based pattern detection approach , 2015, SMSociety.

[21]  El Bouzekri El Idrissi Younes,et al.  A security approach for social networks based on honeypots , 2016 .

[22]  Chao Yang,et al.  A taste of tweets: reverse engineering Twitter spammers , 2014, ACSAC.

[23]  Vern Paxson,et al.  Trafficking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse , 2013, USENIX Security Symposium.

[24]  Yu Wang,et al.  Statistical Features-Based Real-Time Detection of Drifted Twitter Spam , 2017, IEEE Transactions on Information Forensics and Security.

[25]  Surendra Sedhai,et al.  Semi-Supervised Spam Detection in Twitter Stream , 2017, IEEE Transactions on Computational Social Systems.

[26]  Sriram Raghavan,et al.  Regular Expression Learning for Information Extraction , 2008, EMNLP.

[27]  Shaik. AshaBee,et al.  Towards Online Spam Filtering In Social Networks , 2017 .

[28]  James R. Foulds,et al.  Collective Spammer Detection in Evolving Multi-Relational Social Networks , 2015, KDD.

[29]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[30]  Abdulrahman A. Mirza,et al.  Spammer Classification Using Ensemble Methods over Structural Social Network Features , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[31]  Markus Strohmaier,et al.  When Social Bots Attack: Modeling Susceptibility of Users in Online Social Networks , 2012, #MSM.

[32]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[33]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[34]  Qiang Cao,et al.  Uncovering Large Groups of Active Malicious Accounts in Online Social Networks , 2014, CCS.