BIRDNEST: Bayesian Inference for Ratings-Fraud Detection

Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence, in this paper, we propose an approach for detecting fraudulent reviews which combines these 2 approaches in a principled manner, allowing successful detection even when one of these signs is not present. To combine these 2 approaches, we formulate our Bayesian Inference for Rating Data (BIRD) model, a flexible Bayesian model of user rating behavior. Based on our model we formulate a likelihood-based suspiciousness metric, Normalized Expected Surprise Total (NEST). We propose a linear-time algorithm for performing Bayesian inference using our model and computing the metric. Experiments on real data show that BIRDNEST successfully spots review fraud in large, real-world graphs: the 50 most suspicious users of the Flipkart platform flagged by our algorithm were investigated and all identified as fraudulent by domain experts at Flipkart.

[1]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[2]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[3]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[4]  Christos Faloutsos,et al.  A General Suspiciousness Metric for Dense Blocks in Multimodal Data , 2015, 2015 IEEE International Conference on Data Mining.

[5]  Ananthram Swami,et al.  Com2: Fast Automatic Discovery of Temporal ('Comet') Communities , 2014, PAKDD.

[6]  Christos Faloutsos,et al.  RSC: Mining and Modeling Temporal Activity in Social Media , 2015, KDD.

[7]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[8]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[9]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[10]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[11]  Christos Faloutsos,et al.  Inferring Strange Behavior from Connectivity Pattern in Social Networks , 2014, PAKDD.

[12]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[13]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2010, PAKDD.

[14]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[15]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[16]  Christos Faloutsos,et al.  Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective , 2014, 2014 IEEE International Conference on Data Mining.

[17]  T. Minka Estimating a Dirichlet distribution , 2012 .

[18]  Alexander J. Smola,et al.  CoBaFi: collaborative bayesian filtering , 2014, WWW.

[19]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[20]  Christos Faloutsos,et al.  Detecting anomalies in dynamic rating data: a robust probabilistic model for rating evolution , 2014, KDD.

[21]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[22]  Leman Akoglu,et al.  Discovering Opinion Spammer Groups by Network Footprints , 2015, ECML/PKDD.

[23]  Majid Sarrafzadeh,et al.  Unsupervised Discovery of Abnormal Activity Occurrences in Multi-dimensional Time Series, with Applications in Wearable Systems , 2010, SDM.

[24]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[25]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[26]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[27]  Christos Faloutsos,et al.  EdgeCentric: Anomaly Detection in Edge-Attributed Networks , 2015, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[28]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[29]  Pang-Ning Tan,et al.  Detection and Characterization of Anomalies in Multivariate Time Series , 2009, SDM.

[30]  Christos Faloutsos,et al.  Robust multivariate autoregression for anomaly detection in dynamic product ratings , 2014, WWW.

[31]  Jiawei Han,et al.  Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data , 2007, VLDB.