Efficiently Detecting Web Spambots in a Temporally Annotated Sequence

Web spambots are becoming more advanced, utilizing techniques that can defeat existing spam detection algorithms. These techniques include performing a series of malicious actions with variable time delays, repeating the same series of malicious actions multiple times, and interleaving legitimate (decoy) and malicious actions. Existing methods that are based on string pattern matching are not able to detect spambots that use these techniques. In response, we define a new problem to detect spambots utilizing the aforementioned techniques and propose an efficient algorithm to solve it. Given a dictionary of temporally annotated sequences \(\hat{S}\) modeling spambot actions, each associated with a time window, a long, temporally annotated sequence T modeling a user action log, and parameters f and k, our problem seeks to detect each sequence in \(\hat{S}\) that occurs in T at least f times within its associated time window, and with at most k mismatches. Our algorithm solves the problem exactly, it requires linear time and space, and it employs advanced data structures and the Kangaroo method, to deal with the problem efficiently.

[1]  Sanguthevar Rajasekaran,et al.  On pattern matching with k mismatches and few don't cares , 2017, Inf. Process. Lett..

[2]  William F. Smyth,et al.  A taxonomy of suffix array construction algorithms , 2007, CSUR.

[3]  Jeffrey F. Naughton,et al.  Utility-maximizing event stream suppression , 2013, SIGMOD '13.

[4]  Jeff Yan,et al.  A low-cost attack on a Microsoft captcha , 2008, CCS.

[5]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[6]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[7]  William F. Smyth,et al.  Rule-Based On-the-fly Web Spambot Detection Using Action Strings , 2010 .

[8]  Calton Pu,et al.  Social Honeypots: Making Friends With A Spammer Near You , 2008, CEAS.

[9]  Elke A. Rundensteiner,et al.  Active complex event processing , 2010, Proc. VLDB Endow..

[10]  Cristina Dutra de Aguiar Ciferri,et al.  Generalized enhanced suffix array construction in external memory , 2017, Algorithms for Molecular Biology.

[11]  Virgílio A. F. Almeida,et al.  Identifying video spammers in online social networks , 2008, AIRWeb '08.

[12]  Alex Hai Wang,et al.  Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach , 2010, DBSec.

[13]  Alex Talevski,et al.  Behaviour-Based Web Spambot Detection by Utilising Action Time and Action Frequency , 2010, ICCSA.

[14]  Enno Ohlebusch,et al.  Enhanced Suffix Arrays and Applications , 2005 .

[15]  Ge Nong,et al.  Linear Suffix Array Construction by Almost Pure Induced-Sorting , 2009, 2009 Data Compression Conference.

[16]  Costas S. Iliopoulos,et al.  Smart Meter Data Analysis , 2016, ICC 2016.

[17]  Costas S. Iliopoulos,et al.  Detection of Web Spambot in the Presence of Decoy Actions , 2014, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing.

[18]  Georgia Koutrika,et al.  Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges , 2007, IEEE Internet Computing.

[19]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[20]  Peter Sanders,et al.  Linear work suffix array construction , 2006, JACM.