Securing Behavior-based Opinion Spam Detection

Reviews spams are prevalent in e-commerce to manipulate product ranking and customers decisions maliciously. While spams generated based on simple spamming strategy can be detected effectively, hardened spammers can evade regular detectors via more advanced spamming strategies. Previous work gave more attention to evasion against text and graph-based detectors, but evasions against behavior-based detectors are largely ignored, leading to vulnerabilities in spam detection systems. Since real evasion data are scarce, we first propose EMERAL (Evasion via Maximum Entropy and Rating sAmpLing) to generate evasive spams to certain existing detectors. EMERAL can simulate spammers with different goals and levels of knowledge about the detectors, targeting at different stages of the life cycle of target products. We show that in the evasion-defense dynamic, only a few evasion types are meaningful to the spammers, and any spammer will not be able to evade too many detection signals at the same time. We reveal that some evasions are quite insidious and can fail all detection signals. We then propose DETER (Defense via Evasion generaTion using EmeRal), based on model re-training on diverse evasive samples generated by EMERAL. Experiments confirm that DETER is more accurate in detecting both suspicious time window and individual spamming reviews. In terms of security, DETER is versatile enough to be vaccinated against diverse and unexpected evasions, is agnostic about evasion strategy and can be released without privacy concern.

[1]  Martin Ester,et al.  Detecting Singleton Review Spammers Using Semantic Similarity , 2015, WWW.

[2]  Angelos Stavrou,et al.  When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors , 2016, NDSS.

[3]  Yejin Choi,et al.  Distributional Footprints of Deceptive Product Reviews , 2012, ICWSM.

[4]  Huan Liu,et al.  Online Social Spammer Detection , 2014, AAAI.

[5]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[6]  Yevgeniy Vorobeychik,et al.  Optimal randomized classification in adversarial settings , 2014, AAMAS.

[7]  Ke Wang,et al.  Bias and controversy: beyond the statistical deviation , 2006, KDD '06.

[8]  Philip S. Yu,et al.  Identify Online Store Review Spammers via Social Review Graph , 2012, TIST.

[9]  Hyun Ah Song,et al.  FRAUDAR: Bounding Graph Fraud in the Face of Camouflage , 2016, KDD.

[10]  David Stevens,et al.  On the hardness of evading combinations of linear classifiers , 2013, AISec.

[11]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[12]  Ling Huang,et al.  Near-Optimal Evasion of Convex-Inducing Classifiers , 2010, AISTATS.

[13]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[14]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[15]  Brian D. Davison,et al.  Identifying link farm spam pages , 2005, WWW '05.

[16]  Arjun Mukherjee,et al.  On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp , 2016, WWW.

[17]  Paul A. Pavlou,et al.  Overcoming the J-shaped distribution of product reviews , 2009, CACM.

[18]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[19]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[20]  George Valkanas,et al.  The Impact of Fake Reviews on Online Visibility: A Vulnerability Assessment of the Hotel Industry , 2016, Inf. Syst. Res..

[21]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[22]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[23]  Yanjun Qi,et al.  Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers , 2016, NDSS.

[24]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[25]  Bing Liu,et al.  Analyzing and Detecting Review Spam , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[26]  Yevgeniy Vorobeychik,et al.  A General Retraining Framework for Scalable Adversarial Classification , 2016, ArXiv.

[27]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28]  Fabio Roli,et al.  Multiple Classifier Systems for Adversarial Classification Tasks , 2009, MCS.

[29]  Tim Oates,et al.  Ensembles in adversarial classification for spam , 2009, CIKM.

[30]  Chrysanthos Dellarocas,et al.  Using Online Ratings as a Proxy of Word-of-Mouth in Motion Picture Revenue Forecasting , 2005 .

[31]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[32]  Liang Tong,et al.  Feature Conservation in Adversarial Classifier Evasion: A Case Study , 2017, ArXiv.

[33]  Minhwan Yu,et al.  Deep Semantic Frame-Based Deceptive Opinion Spam Analysis , 2015, CIKM.

[34]  Santhosh Kumar,et al.  Temporal Opinion Spam Detection by Multivariate Indicative Signals , 2016, ICWSM.

[35]  Giorgio Giacinto,et al.  Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious PDF files detection , 2013, ASIA CCS '13.

[36]  Hai Zhao,et al.  Using Deep Linguistic Features for Finding Deceptive Opinion Spam , 2012, COLING.

[37]  Juanjuan Zhang,et al.  How Does Popularity Information Affect Choices? A Field Experiment , 2009 .

[38]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[39]  Arjun Mukherjee,et al.  Exploiting Burstiness in Reviews for Review Spammer Detection , 2021, ICWSM.

[40]  Fabio Roli,et al.  Randomized Prediction Games for Adversarial Machine Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Tobias Scheffer,et al.  Nash Equilibria of Static Prediction Games , 2009, NIPS.

[42]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[43]  Andrew Whinston,et al.  The Dynamics of Online Word-of-Mouth and Product Sales: An Empirical Investigation of the Movie Industry , 2008 .

[44]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[45]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[46]  Bing Liu,et al.  Spotting Fake Reviews via Collective Positive-Unlabeled Learning , 2014, 2014 IEEE International Conference on Data Mining.

[47]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[48]  Yizheng Chen,et al.  Practical Attacks Against Graph-based Clustering , 2017, CCS.

[49]  J. Doug Tygar,et al.  Evasion and Hardening of Tree Ensemble Classifiers , 2015, ICML.

[50]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[51]  Beibei Li,et al.  Examining the Impact of Ranking on Consumer Behavior and Search Engine Revenue , 2013, Manag. Sci..

[52]  Tudor Dumitras,et al.  FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature , 2016, CCS.