论文信息 - Stay On-Topic: Generating Context-specific Fake Restaurant Reviews

Stay On-Topic: Generating Context-specific Fake Restaurant Reviews

Automatically generated fake restaurant reviews are a threat to online review systems. Recent research has shown that users have difficulties in detecting machine-generated fake reviews hiding among real restaurant reviews. The method used in this work (char-LSTM) has one drawback: it has difficulties staying in context, i.e. when it generates a review for specific target entity, the resulting review may contain phrases that are unrelated to the target, thus increasing its detectability. In this work, we present and evaluate a more sophisticated technique based on neural machine translation (NMT) with which we can generate reviews that stay on-topic. We test multiple variants of our technique using native English speakers on Amazon Mechanical Turk. We demonstrate that reviews generated by the best variant have almost optimal undetectability (class-averaged F-score 47%). We conduct a user study with experienced users and show that our method evades detection more frequently compared to the state-of-the-art (average evasion 3.2 / 4 vs 1.5 / 4) with statistical significance, at level \(\alpha = 1\%\) (Sect. 4.3). We develop very effective detection tools and reach average F-score of \(97\%\) in classifying these. Although fake reviews are very effective in fooling people, effective automatic detection is still feasible.

[1] Christopher Meek,et al. Adversarial learning , 2005, KDD '05.

[2] Claudio Soriente,et al. Sound-Proof: Usable Two-Factor Authentication Based on Ambient Sound , 2015, USENIX Security Symposium.

[3] David Robinson,et al. Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[4] Georgios Zervas,et al. Fake It Till You Make It: Reputation, Competition, and Yelp Review Fraud , 2015, Manag. Sci..

[5] Seong Joon Oh,et al. Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[6] Lucas Dixon,et al. Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[7] Srdjan Capkun,et al. Relay Attacks on Passive Keyless Entry and Start Systems in Modern Cars , 2010, NDSS.

[8] Tuomas Aura,et al. Strategies against replay attacks , 1997, Proceedings 10th Computer Security Foundations Workshop.

[9] Jeffrey Vander Stoep,et al. Design and Implementation of Reliable Localization Algorithms using Received Signal Strength , 2009 .

[10] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[11] Eyal de Lara,et al. Amigo: Proximity-Based Authentication of Mobile Devices , 2007, UbiComp.

[12] Ryan L. Boyd,et al. The Development and Psychometric Properties of LIWC2015 , 2015 .

[13] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[14] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15] Siddharth Garg,et al. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.

[16] Radha Poovendran,et al. Google's Cloud Vision API is Not Robust to Noise , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[17] Christian Biemann,et al. A Random Text Model for the Generation of Statistical Language Invariants , 2007, NAACL.

[18] Wojciech Zaremba,et al. An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[19] Claus-Peter H. Ernst,et al. The Influence of Privacy Risk on Smartwatch Usage , 2016, AMCIS.

[20] Leman Akoglu,et al. Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[21] Michael Naehrig,et al. CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[22] Una-May O'Reilly,et al. There are No Bit Parts for Sign Bits in Black-Box Attacks , 2019, ArXiv.

[23] Michael Luca. Reviews, Reputation, and Revenue: The Case of Yelp.Com , 2016 .

[24] James H. Martin,et al. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[25] Jason R. C. Nurse,et al. The anatomy of online deception: what makes automated text convincing? , 2016, SAC.

[26] Dan A. Simovici. No-Free-Lunch Theorem , 2017, Encyclopedia of Machine Learning and Data Mining.

[27] Srdjan Capkun,et al. Secure RSS-based localization in sensor networks , 2011 .

[28] Paul C. van Oorschot,et al. White-Box Cryptography and an AES Implementation , 2002, Selected Areas in Cryptography.

[29] Ahmad-Reza Sadeghi,et al. Revisiting Context-Based Authentication in IoT , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[30] Ani Nahapetian,et al. WristSnoop: Smartphone PINs prediction using smartwatch motion sensors , 2015, 2015 IEEE International Workshop on Information Forensics and Security (WIFS).

[31] Ingmar Weber,et al. You Are What Apps You Use: Demographic Prediction Based on User's Apps , 2016, ICWSM.

[32] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[33] R. Poovendran,et al. CARAVAN: Providing Location Privacy for VANET , 2005 .

[34] Hannes Hartenstein,et al. VANET: Vehicular Applications and Inter-Networking Technologies , 2010, VANET.

[35] Chang Liu,et al. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[36] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] M. Hirano,et al. Keyless entry system with radio card transponder (automobiles) , 1988 .

[38] Ahmad-Reza Sadeghi,et al. Context-Based Zero-Interaction Pairing and Key Evolution for Advanced Personal Devices , 2014, CCS.

[39] Xiaohui Liang,et al. Pseudonym Changing at Social Spots: An Effective Strategy for Location Privacy in VANETs , 2012, IEEE Transactions on Vehicular Technology.

[40] Srdjan Capkun,et al. On the requirements for successful GPS spoofing attacks , 2011, CCS '11.

[41] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[42] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[43] Patrick Weber,et al. OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[44] Benny Pinkas,et al. Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[45] Di Ma,et al. A context-aware approach to defend against unauthorized reading and relay attacks in RFID systems , 2014, Secur. Commun. Networks.

[46] Jesse Chandler,et al. Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers , 2013, Behavior Research Methods.

[47] David Kotz,et al. ZEBRA: Zero-Effort Bilateral Recurring Authentication , 2014, IEEE Symposium on Security and Privacy.

[48] Steve Peers,et al. Article 17 – Right to Property , 2014 .

[49] Arjun Mukherjee,et al. What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[50] Ahmad-Reza Sadeghi,et al. ConXsense: automated context classification for context-aware access control , 2013, AsiaCCS.

[51] Ivan Martinovic,et al. Preventing Lunchtime Attacks: Fighting Insider Threats With Eye Movement Biometrics , 2015, NDSS.

[52] Catherine Morency,et al. Smart card data use in public transit: A literature review , 2011 .

[53] Tribhuvanesh Orekondy,et al. Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Anantha Chandrakasan,et al. Gazelle: A Low Latency Framework for Secure Neural Network Inference , 2018, IACR Cryptol. ePrint Arch..

[55] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[56] Fabio Roli,et al. Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[57] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[58] N. Asokan,et al. Drone to the Rescue: Relay-Resilient Authentication using Ambient Multi-sensing , 2014, Financial Cryptography.

[59] Neil Shah,et al. False Information on Web and Social Media: A Survey , 2018, ArXiv.

[60] Aleksander Madry,et al. Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[61] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[62] I. Guyon,et al. Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[63] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[64] Steven E Shladover,et al. OVERVIEW OF PLATOONING SYSTEMS , 2012 .

[65] Matt Bishop,et al. What Is Computer Security? , 2003, IEEE Secur. Priv..

[66] Ben Y. Zhao,et al. Automated Crowdturfing Attacks and Defenses in Online Review Systems , 2017, CCS.

[67] M. Bar-Hillel. The base-rate fallacy in probability judgments. , 1980 .

[68] Radha Poovendran,et al. AMOEBA: Robust Location Privacy Scheme for VANET , 2007, IEEE Journal on Selected Areas in Communications.

[69] Fan Zhang,et al. Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[70] Jun Zhu,et al. Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71] Jian Liu,et al. SoK: Modular and Efficient Private Decision Tree Evaluation , 2019, IACR Cryptol. ePrint Arch..

[72] Roland Siegwart,et al. Introduction to Autonomous Mobile Robots , 2004 .

[73] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[74] Gang Wang,et al. Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[75] Dirk Hovy,et al. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[76] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[77] F. Strack,et al. Explaining the Enigmatic Anchoring Effect: Mechanisms of Selective Accessibility , 1997 .

[78] Sebastian Risi,et al. Deep-Spying: Spying using Smartwatch and Deep Learning , 2015, ArXiv.

[79] Carmela Troncoso,et al. Protecting location privacy: optimal strategy against localization attacks , 2012, CCS.

[80] Frederik Vercauteren,et al. EPIC: Efficient Private Image Classification (or: Learning from the Masters) , 2019, CT-RSA.

[81] Radha Poovendran,et al. Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[82] Ingmar Weber,et al. Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[83] Tapani Rinta-Kahila,et al. Understanding Crowdturfing: the Different Ethical Logics behind the Clandestine Industry of Deception , 2017, ECIS.

[84] Matthew R. Walter,et al. Coherent Dialogue with Attention-Based Language Models , 2016, AAAI.

[85] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[86] Xin Huang,et al. Improving RSS-Based Ranging in LOS-NLOS Scenario Using GMMs , 2011, IEEE Communications Letters.

[87] P. Levis,et al. RSSI is Under Appreciated , 2006 .

[88] A. Tjoa,et al. Information and Communication Technologies in Tourism , 1996, Springer Vienna.

[89] Claudio Soriente,et al. DoubleEcho: Mitigating Context-Manipulation Attacks in Copresence Verification , 2018, 2019 IEEE International Conference on Pervasive Computing and Communications (PerCom.

[90] Peter Volgyesi,et al. Towards Precise Indoor RF Localization , 2008 .

[91] Shambhu Upadhyaya,et al. Is RSSI a Reliable Parameter in Sensor Localization Algorithms – An Experimental Study , 2009 .

[92] Jinfeng Yi,et al. AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[93] Peter O'Connor,et al. User-Generated Content and Travel: A Case Study on Tripadvisor.Com , 2008, ENTER.

[94] K. Bretonnel Cohen,et al. Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine? , 2011, CL.

[95] Mark Johnson,et al. An Improved Non-monotonic Transition System for Dependency Parsing , 2015, EMNLP.

[96] Qi Shan,et al. RIDI: Robust IMU Double Integration , 2017, ECCV.

[97] Elizabeth D. Liddy,et al. Assessing Credibility of Weblogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[98] Vasudeva Varma,et al. Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[99] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[100] Xintao Wu,et al. Regression Model Fitting under Differential Privacy and Model Inversion Attack , 2015, IJCAI.

[101] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[102] Somesh Jha,et al. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[103] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[104] Nitesh Saxena,et al. Still and Silent: Motion Detection for Enhanced RFID Security and Privacy without Changing the Usage Model , 2010, RFIDSec.

[105] Danny Dolev,et al. On the security of public key protocols , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[106] Juha Karhunen,et al. Stochastic discriminant analysis for linear supervised dimension reduction , 2018, Neurocomputing.

[107] Dawn Song,et al. Smart Locks: Lessons for Securing Commodity Internet of Things Devices , 2016, AsiaCCS.

[108] Levente Buttyán,et al. SLOW: A Practical pseudonym changing scheme for location privacy in VANETs , 2009, 2009 IEEE Vehicular Networking Conference (VNC).

[109] Di Ma,et al. Secure Proximity Detection for NFC Devices Based on Ambient Sensor Data , 2012, ESORICS.

[110] Kang Wang. Time and Position Spoofing with Open Source Projects , 2015 .

[111] Tadayoshi Kohno,et al. RFIDs and secret handshakes: defending against ghost-and-leech attacks and unauthorized reads with context-aware communications , 2008, CCS.

[112] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[113] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[114] Vitaly Shmatikov,et al. Machine Learning Models that Remember Too Much , 2017, CCS.

[115] Johannes Stallkamp,et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[116] Christus,et al. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[117] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[118] F. Seco,et al. A comparison of Pedestrian Dead-Reckoning algorithms using a low-cost MEMS IMU , 2009, 2009 IEEE International Symposium on Intelligent Signal Processing.

[119] Emmanouil Panaousis,et al. The Applicability of Ambient Sensors as Proximity Evidence for NFC Transactions , 2017, 2017 IEEE Security and Privacy Workshops (SPW).

[120] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[121] Nic Ford,et al. Adversarial Examples Are a Natural Consequence of Test Error in Noise , 2019, ICML.

[122] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[123] Logan Engstrom,et al. Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[124] Lujo Bauer,et al. On the Suitability of Lp-Norms for Creating and Preventing Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[125] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[126] Xiangyu Liu,et al. When Good Becomes Evil: Keystroke Inference with Smartwatch , 2015, CCS.

[127] Srdjan Capkun,et al. Realization of RF Distance Bounding , 2010, USENIX Security Symposium.

[128] Bernard P. Zajac. Applied cryptography: Protocols, algorithms, and source code in C , 1994 .

[129] Patrick P. K. Chan,et al. One-and-a-Half-Class Multiple Classifier Systems for Secure Learning Against Evasion Attacks at Test Time , 2015, MCS.

[130] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[131] Xiang Gao,et al. Comparing and fusing different sensor modalities for relay attack resistance in Zero-Interaction Authentication , 2014, 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[132] N. Asokan,et al. Sensor-Based Proximity Detection in the Face of Active Adversaries , 2019, IEEE Transactions on Mobile Computing.

[133] P. Jonathon Phillips,et al. An Introduction to Evaluating Biometric Systems , 2000, Computer.

[134] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[135] Andrew Aubrey Faulkner. Sensor stabilization, localization, obstacle detection, and path planning for autonomous rovers: A case study , 2015 .

[136] Yao Lu,et al. Oblivious Neural Network Predictions via MiniONN Transformations , 2017, IACR Cryptol. ePrint Arch..