论文信息 - Attention Please: Your Attention Check Questions in Survey Studies Can Be Automatically Answered

Attention Please: Your Attention Check Questions in Survey Studies Can Be Automatically Answered

Attention check questions have become commonly used in online surveys published on popular crowdsourcing platforms as a key mechanism to filter out inattentive respondents and improve data quality. However, little research considers the vulnerabilities of this important quality control mechanism that can allow attackers including irresponsible and malicious respondents to automatically answer attention check questions for efficiently achieving their goals. In this paper, we perform the first study to investigate such vulnerabilities, and demonstrate that attackers can leverage deep learning techniques to pass attention check questions automatically. We propose AC-EasyPass, an attack framework with a concrete model, that combines convolutional neural network and weighted feature reconstruction to easily pass attention check questions. We construct the first attention check question dataset that consists of both original and augmented questions, and demonstrate the effectiveness of AC-EasyPass. We explore two simple defense methods, adding adversarial sentences and adding typos, for survey designers to mitigate the risks posed by AC-EasyPass; however, these methods are fragile due to their limitations from both technical and usability perspectives, underlining the challenging nature of defense. We hope our work will raise sufficient attention of the research community towards developing more robust attention check mechanisms. More broadly, our work intends to prompt the research community to seriously consider the emerging risks posed by the malicious use of machine learning techniques to the quality, validity, and trustworthiness of crowdsourcing and social computing.

[1] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[2] Radha Poovendran,et al. Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[3] Zheng Wang,et al. Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach , 2018, CCS.

[4] Bowen Zhou,et al. Applying deep learning to answer selection: A study and an open task , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[5] A. Meade,et al. Identifying careless responses in survey data. , 2012, Psychological methods.

[6] Francisco Vilar Brasileiro,et al. An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market , 2017, Computer Supported Cooperative Work (CSCW).

[7] Lei Yu,et al. Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[8] Gianluca Demartini,et al. All That Glitters Is Gold - An Attack Scheme on Gold Questions in Crowdsourcing , 2018, HCOMP.

[9] Anna L. Cox,et al. Diminished Control in Crowdsourcing , 2016, ACM Trans. Comput. Hum. Interact..

[10] Kazuaki Kishida. Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments , 2005 .

[11] K. Sheehan,et al. An Analysis of Data Quality: Professional Panels, Student Subject Pools, and Amazon's Mechanical Turk , 2017 .

[12] Bowen Zhou,et al. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[13] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14] Kathryn T. Stolee,et al. Exploring Crowd Consistency in a Mechanical Turk Survey , 2016, 2016 IEEE/ACM 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE).

[15] Daniel M. Oppenheimer,et al. Instructional Manipulation Checks: Detecting Satisficing to Increase Statistical Power , 2009 .

[16] Ping Zhang,et al. A Simple Generic Attack on Text Captchas , 2016, NDSS.

[17] Asako Miura,et al. Survey Satisficing Inflates Stereotypical Responses in Online Experiment: The Case of Immigration Study , 2016, Front. Psychol..

[18] Amar Cheema,et al. Data collection in a flat world: the strengths and weaknesses of mechanical turk samples , 2013 .

[19] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[20] Elizabeth M. Poposki,et al. Detecting and Deterring Insufficient Effort Responding to Surveys , 2012 .

[21] Michael K. Reiter,et al. To Permit or Not to Permit, That is the Usability Question: Crowdsourcing Mobile Apps’ Privacy Permission Settings , 2017, Proc. Priv. Enhancing Technol..

[22] Todd Lingren,et al. Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing , 2013, Journal of medical Internet research.

[23] Samuel B. Pond,et al. Using virtual presence and survey instructions to minimize careless responding on Internet-based surveys , 2015, Comput. Hum. Behav..

[24] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[25] Michael S. Bernstein,et al. The future of crowd work , 2013, CSCW.

[26] Hongwei Li,et al. Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing , 2014, ArXiv.