ND-Sync: Detecting Synchronized Fraud Activities

Given the retweeting activity for the posts of several Twitter users, how can we distinguish organic activity from spammy retweets by paid followers to boost a post’s appearance of popularity? More generally, given groups of observations, can we spot strange groups? Our main intuition is that organic behavior has more variability, while fraudulent behavior, like retweets by botnet members, is more synchronized. We refer to the detection of such synchronized observations as the Synchonization Fraud problem, and we study a specific instance of it, Retweet Fraud Detection, manifested in Twitter. Here, we propose: (A) ND-Sync, an efficient method for detecting group fraud, and (B) a set of carefully designed features for characterizing retweet threads. ND-Sync is effective in spotting retweet fraudsters, robust to different types of abnormal activity, and adaptable as it can easily incorporate additional features. Our method achieves a 97% accuracy on a real dataset of 12 million retweets crawled from Twitter.

[1]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[2]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[3]  Rose Yu,et al.  GLAD: group anomaly detection in social media analysis , 2014, ACM Trans. Knowl. Discov. Data.

[4]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[5]  Philip K. Chan,et al.  Modeling multiple time series for anomaly detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[8]  Barnabás Póczos,et al.  Efficient Learning on Point Sets , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Christos Faloutsos,et al.  Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective , 2014, 2014 IEEE International Conference on Data Mining.

[10]  Christos Faloutsos,et al.  CatchSync: catching synchronized behavior in large directed graphs , 2014, KDD.

[11]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[12]  Alex Pentland,et al.  Twitter: who gets caught? observed trends in social micro-blogging spam , 2014, WebSci '14.

[13]  M. Hubert,et al.  A Robust Measure of Skewness , 2004 .

[14]  Fabrício Benevenuto,et al.  Reverse engineering socialbot infiltration strategies in Twitter , 2014, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[15]  Srinivasan Parthasarathy,et al.  Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.

[16]  R. Garrett The chi-square plot: a tool for multivariate outlier recognition , 1989 .

[17]  Gang Wang,et al.  Follow the green: growth and dynamics in twitter follower markets , 2013, Internet Measurement Conference.

[18]  Yan Liu,et al.  GLAD: group anomaly detection in social media analysis , 2014, KDD.

[19]  Geoffrey G. Hazel,et al.  Multivariate Gaussian MRF for multispectral scene segmentation and anomaly detection , 2000, IEEE Trans. Geosci. Remote. Sens..

[20]  Christos Faloutsos,et al.  Inferring Strange Behavior from Connectivity Pattern in Social Networks , 2014, PAKDD.

[21]  Kristina Lerman,et al.  Entropy-based Classification of 'Retweeting' Activity on Twitter , 2011, ArXiv.

[22]  I. Jolliffe Discarding Variables in a Principal Component Analysis. Ii: Real Data , 1973 .

[23]  JajodiaSushil,et al.  Detecting Automation of Twitter Accounts , 2012 .

[24]  Barnabás Póczos,et al.  Group Anomaly Detection using Flexible Genre Models , 2011, NIPS.

[25]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[26]  A. Faisal,et al.  Scaling-Laws of Human Broadcast Communication Enable Distinction between Human, Corporate and Robot Twitter Users , 2013, PloS one.

[27]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[28]  Benjamin Waugh,et al.  Twitter Deception and Influence: Issues of Identity, Slacktivism, and Puppetry , 2014 .

[29]  Mia Hubert,et al.  Computational Statistics and Data Analysis Robust Pca for Skewed Data and Its Outlier Map , 2022 .