Stay On-Topic: Generating Context-specific Fake Restaurant Reviews

Automatically generated fake restaurant reviews are a threat to online review systems. Recent research has shown that users have difficulties in detecting machine-generated fake reviews hiding among real restaurant reviews. The method used in this work (char-LSTM) has one drawback: it has difficulties staying in context, i.e. when it generates a review for specific target entity, the resulting review may contain phrases that are unrelated to the target, thus increasing its detectability. In this work, we present and evaluate a more sophisticated technique based on neural machine translation (NMT) with which we can generate reviews that stay on-topic. We test multiple variants of our technique using native English speakers on Amazon Mechanical Turk. We demonstrate that reviews generated by the best variant have almost optimal undetectability (class-averaged F-score 47%). We conduct a user study with experienced users and show that our method evades detection more frequently compared to the state-of-the-art (average evasion 3.2 / 4 vs 1.5 / 4) with statistical significance, at level \(\alpha = 1\%\) (Sect. 4.3). We develop very effective detection tools and reach average F-score of \(97\%\) in classifying these. Although fake reviews are very effective in fooling people, effective automatic detection is still feasible.

[1]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[2]  Claudio Soriente,et al.  Sound-Proof: Usable Two-Factor Authentication Based on Ambient Sound , 2015, USENIX Security Symposium.

[3]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[4]  Georgios Zervas,et al.  Fake It Till You Make It: Reputation, Competition, and Yelp Review Fraud , 2015, Manag. Sci..

[5]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[6]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[7]  Srdjan Capkun,et al.  Relay Attacks on Passive Keyless Entry and Start Systems in Modern Cars , 2010, NDSS.

[8]  Tuomas Aura,et al.  Strategies against replay attacks , 1997, Proceedings 10th Computer Security Foundations Workshop.

[9]  Jeffrey Vander Stoep,et al.  Design and Implementation of Reliable Localization Algorithms using Received Signal Strength , 2009 .

[10]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[11]  Eyal de Lara,et al.  Amigo: Proximity-Based Authentication of Mobile Devices , 2007, UbiComp.

[12]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[13]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[14]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15]  Siddharth Garg,et al.  BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.

[16]  Radha Poovendran,et al.  Google's Cloud Vision API is Not Robust to Noise , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[17]  Christian Biemann,et al.  A Random Text Model for the Generation of Statistical Language Invariants , 2007, NAACL.

[18]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[19]  Claus-Peter H. Ernst,et al.  The Influence of Privacy Risk on Smartwatch Usage , 2016, AMCIS.

[20]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[21]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[22]  Una-May O'Reilly,et al.  There are No Bit Parts for Sign Bits in Black-Box Attacks , 2019, ArXiv.

[23]  Michael Luca Reviews, Reputation, and Revenue: The Case of Yelp.Com , 2016 .

[24]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[25]  Jason R. C. Nurse,et al.  The anatomy of online deception: what makes automated text convincing? , 2016, SAC.

[26]  Dan A. Simovici No-Free-Lunch Theorem , 2017, Encyclopedia of Machine Learning and Data Mining.

[27]  Srdjan Capkun,et al.  Secure RSS-based localization in sensor networks , 2011 .

[28]  Paul C. van Oorschot,et al.  White-Box Cryptography and an AES Implementation , 2002, Selected Areas in Cryptography.

[29]  Ahmad-Reza Sadeghi,et al.  Revisiting Context-Based Authentication in IoT , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[30]  Ani Nahapetian,et al.  WristSnoop: Smartphone PINs prediction using smartwatch motion sensors , 2015, 2015 IEEE International Workshop on Information Forensics and Security (WIFS).

[31]  Ingmar Weber,et al.  You Are What Apps You Use: Demographic Prediction Based on User's Apps , 2016, ICWSM.

[32]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[33]  R. Poovendran,et al.  CARAVAN: Providing Location Privacy for VANET , 2005 .

[34]  Hannes Hartenstein,et al.  VANET: Vehicular Applications and Inter-Networking Technologies , 2010, VANET.

[35]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  M. Hirano,et al.  Keyless entry system with radio card transponder (automobiles) , 1988 .

[38]  Ahmad-Reza Sadeghi,et al.  Context-Based Zero-Interaction Pairing and Key Evolution for Advanced Personal Devices , 2014, CCS.

[39]  Xiaohui Liang,et al.  Pseudonym Changing at Social Spots: An Effective Strategy for Location Privacy in VANETs , 2012, IEEE Transactions on Vehicular Technology.

[40]  Srdjan Capkun,et al.  On the requirements for successful GPS spoofing attacks , 2011, CCS '11.

[41]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[42]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[43]  Patrick Weber,et al.  OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[44]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[45]  Di Ma,et al.  A context-aware approach to defend against unauthorized reading and relay attacks in RFID systems , 2014, Secur. Commun. Networks.

[46]  Jesse Chandler,et al.  Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers , 2013, Behavior Research Methods.

[47]  David Kotz,et al.  ZEBRA: Zero-Effort Bilateral Recurring Authentication , 2014, IEEE Symposium on Security and Privacy.

[48]  Steve Peers,et al.  Article 17 – Right to Property , 2014 .

[49]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[50]  Ahmad-Reza Sadeghi,et al.  ConXsense: automated context classification for context-aware access control , 2013, AsiaCCS.

[51]  Ivan Martinovic,et al.  Preventing Lunchtime Attacks: Fighting Insider Threats With Eye Movement Biometrics , 2015, NDSS.

[52]  Catherine Morency,et al.  Smart card data use in public transit: A literature review , 2011 .

[53]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Anantha Chandrakasan,et al.  Gazelle: A Low Latency Framework for Secure Neural Network Inference , 2018, IACR Cryptol. ePrint Arch..

[55]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[56]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[57]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[58]  N. Asokan,et al.  Drone to the Rescue: Relay-Resilient Authentication using Ambient Multi-sensing , 2014, Financial Cryptography.

[59]  Neil Shah,et al.  False Information on Web and Social Media: A Survey , 2018, ArXiv.

[60]  Aleksander Madry,et al.  Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[61]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[62]  I. Guyon,et al.  Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[63]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[64]  Steven E Shladover,et al.  OVERVIEW OF PLATOONING SYSTEMS , 2012 .

[65]  Matt Bishop,et al.  What Is Computer Security? , 2003, IEEE Secur. Priv..

[66]  Ben Y. Zhao,et al.  Automated Crowdturfing Attacks and Defenses in Online Review Systems , 2017, CCS.

[67]  M. Bar-Hillel The base-rate fallacy in probability judgments. , 1980 .

[68]  Radha Poovendran,et al.  AMOEBA: Robust Location Privacy Scheme for VANET , 2007, IEEE Journal on Selected Areas in Communications.

[69]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[70]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Jian Liu,et al.  SoK: Modular and Efficient Private Decision Tree Evaluation , 2019, IACR Cryptol. ePrint Arch..

[72]  Roland Siegwart,et al.  Introduction to Autonomous Mobile Robots , 2004 .

[73]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[74]  Gang Wang,et al.  Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[75]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[76]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[77]  F. Strack,et al.  Explaining the Enigmatic Anchoring Effect: Mechanisms of Selective Accessibility , 1997 .

[78]  Sebastian Risi,et al.  Deep-Spying: Spying using Smartwatch and Deep Learning , 2015, ArXiv.

[79]  Carmela Troncoso,et al.  Protecting location privacy: optimal strategy against localization attacks , 2012, CCS.

[80]  Frederik Vercauteren,et al.  EPIC: Efficient Private Image Classification (or: Learning from the Masters) , 2019, CT-RSA.

[81]  Radha Poovendran,et al.  Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[82]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[83]  Tapani Rinta-Kahila,et al.  Understanding Crowdturfing: the Different Ethical Logics behind the Clandestine Industry of Deception , 2017, ECIS.

[84]  Matthew R. Walter,et al.  Coherent Dialogue with Attention-Based Language Models , 2016, AAAI.

[85]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[86]  Xin Huang,et al.  Improving RSS-Based Ranging in LOS-NLOS Scenario Using GMMs , 2011, IEEE Communications Letters.

[87]  P. Levis,et al.  RSSI is Under Appreciated , 2006 .

[88]  A. Tjoa,et al.  Information and Communication Technologies in Tourism , 1996, Springer Vienna.

[89]  Claudio Soriente,et al.  DoubleEcho: Mitigating Context-Manipulation Attacks in Copresence Verification , 2018, 2019 IEEE International Conference on Pervasive Computing and Communications (PerCom.

[90]  Peter Volgyesi,et al.  Towards Precise Indoor RF Localization , 2008 .

[91]  Shambhu Upadhyaya,et al.  Is RSSI a Reliable Parameter in Sensor Localization Algorithms – An Experimental Study , 2009 .

[92]  Jinfeng Yi,et al.  AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[93]  Peter O'Connor,et al.  User-Generated Content and Travel: A Case Study on Tripadvisor.Com , 2008, ENTER.

[94]  K. Bretonnel Cohen,et al.  Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine? , 2011, CL.

[95]  Mark Johnson,et al.  An Improved Non-monotonic Transition System for Dependency Parsing , 2015, EMNLP.

[96]  Qi Shan,et al.  RIDI: Robust IMU Double Integration , 2017, ECCV.

[97]  Elizabeth D. Liddy,et al.  Assessing Credibility of Weblogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[98]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[99]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[100]  Xintao Wu,et al.  Regression Model Fitting under Differential Privacy and Model Inversion Attack , 2015, IJCAI.

[101]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[102]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[103]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[104]  Nitesh Saxena,et al.  Still and Silent: Motion Detection for Enhanced RFID Security and Privacy without Changing the Usage Model , 2010, RFIDSec.

[105]  Danny Dolev,et al.  On the security of public key protocols , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[106]  Juha Karhunen,et al.  Stochastic discriminant analysis for linear supervised dimension reduction , 2018, Neurocomputing.

[107]  Dawn Song,et al.  Smart Locks: Lessons for Securing Commodity Internet of Things Devices , 2016, AsiaCCS.

[108]  Levente Buttyán,et al.  SLOW: A Practical pseudonym changing scheme for location privacy in VANETs , 2009, 2009 IEEE Vehicular Networking Conference (VNC).

[109]  Di Ma,et al.  Secure Proximity Detection for NFC Devices Based on Ambient Sensor Data , 2012, ESORICS.

[110]  Kang Wang Time and Position Spoofing with Open Source Projects , 2015 .

[111]  Tadayoshi Kohno,et al.  RFIDs and secret handshakes: defending against ghost-and-leech attacks and unauthorized reads with context-aware communications , 2008, CCS.

[112]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[113]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[114]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[115]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[116]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[117]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[118]  F. Seco,et al.  A comparison of Pedestrian Dead-Reckoning algorithms using a low-cost MEMS IMU , 2009, 2009 IEEE International Symposium on Intelligent Signal Processing.

[119]  Emmanouil Panaousis,et al.  The Applicability of Ambient Sensors as Proximity Evidence for NFC Transactions , 2017, 2017 IEEE Security and Privacy Workshops (SPW).

[120]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[121]  Nic Ford,et al.  Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[122]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[123]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[124]  Lujo Bauer,et al.  On the Suitability of Lp-Norms for Creating and Preventing Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[125]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[126]  Xiangyu Liu,et al.  When Good Becomes Evil: Keystroke Inference with Smartwatch , 2015, CCS.

[127]  Srdjan Capkun,et al.  Realization of RF Distance Bounding , 2010, USENIX Security Symposium.

[128]  Bernard P. Zajac Applied cryptography: Protocols, algorithms, and source code in C , 1994 .

[129]  Patrick P. K. Chan,et al.  One-and-a-Half-Class Multiple Classifier Systems for Secure Learning Against Evasion Attacks at Test Time , 2015, MCS.

[130]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[131]  Xiang Gao,et al.  Comparing and fusing different sensor modalities for relay attack resistance in Zero-Interaction Authentication , 2014, 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[132]  N. Asokan,et al.  Sensor-Based Proximity Detection in the Face of Active Adversaries , 2019, IEEE Transactions on Mobile Computing.

[133]  P. Jonathon Phillips,et al.  An Introduction to Evaluating Biometric Systems , 2000, Computer.

[134]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[135]  Andrew Aubrey Faulkner Sensor stabilization, localization, obstacle detection, and path planning for autonomous rovers: A case study , 2015 .

[136]  Yao Lu,et al.  Oblivious Neural Network Predictions via MiniONN Transformations , 2017, IACR Cryptol. ePrint Arch..