Attention Please: Your Attention Check Questions in Survey Studies Can Be Automatically Answered

Attention check questions have become commonly used in online surveys published on popular crowdsourcing platforms as a key mechanism to filter out inattentive respondents and improve data quality. However, little research considers the vulnerabilities of this important quality control mechanism that can allow attackers including irresponsible and malicious respondents to automatically answer attention check questions for efficiently achieving their goals. In this paper, we perform the first study to investigate such vulnerabilities, and demonstrate that attackers can leverage deep learning techniques to pass attention check questions automatically. We propose AC-EasyPass, an attack framework with a concrete model, that combines convolutional neural network and weighted feature reconstruction to easily pass attention check questions. We construct the first attention check question dataset that consists of both original and augmented questions, and demonstrate the effectiveness of AC-EasyPass. We explore two simple defense methods, adding adversarial sentences and adding typos, for survey designers to mitigate the risks posed by AC-EasyPass; however, these methods are fragile due to their limitations from both technical and usability perspectives, underlining the challenging nature of defense. We hope our work will raise sufficient attention of the research community towards developing more robust attention check mechanisms. More broadly, our work intends to prompt the research community to seriously consider the emerging risks posed by the malicious use of machine learning techniques to the quality, validity, and trustworthiness of crowdsourcing and social computing.

[1]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[2]  Radha Poovendran,et al.  Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[3]  Zheng Wang,et al.  Yet Another Text Captcha Solver: A Generative Adversarial Network Based Approach , 2018, CCS.

[4]  Bowen Zhou,et al.  Applying deep learning to answer selection: A study and an open task , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[5]  A. Meade,et al.  Identifying careless responses in survey data. , 2012, Psychological methods.

[6]  Francisco Vilar Brasileiro,et al.  An Analysis of the Use of Qualifications on the Amazon Mechanical Turk Online Labor Market , 2017, Computer Supported Cooperative Work (CSCW).

[7]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[8]  Gianluca Demartini,et al.  All That Glitters Is Gold - An Attack Scheme on Gold Questions in Crowdsourcing , 2018, HCOMP.

[9]  Anna L. Cox,et al.  Diminished Control in Crowdsourcing , 2016, ACM Trans. Comput. Hum. Interact..

[10]  Kazuaki Kishida Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments , 2005 .

[11]  K. Sheehan,et al.  An Analysis of Data Quality: Professional Panels, Student Subject Pools, and Amazon's Mechanical Turk , 2017 .

[12]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Kathryn T. Stolee,et al.  Exploring Crowd Consistency in a Mechanical Turk Survey , 2016, 2016 IEEE/ACM 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE).

[15]  Daniel M. Oppenheimer,et al.  Instructional Manipulation Checks: Detecting Satisficing to Increase Statistical Power , 2009 .

[16]  Ping Zhang,et al.  A Simple Generic Attack on Text Captchas , 2016, NDSS.

[17]  Asako Miura,et al.  Survey Satisficing Inflates Stereotypical Responses in Online Experiment: The Case of Immigration Study , 2016, Front. Psychol..

[18]  Amar Cheema,et al.  Data collection in a flat world: the strengths and weaknesses of mechanical turk samples , 2013 .

[19]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[20]  Elizabeth M. Poposki,et al.  Detecting and Deterring Insufficient Effort Responding to Surveys , 2012 .

[21]  Michael K. Reiter,et al.  To Permit or Not to Permit, That is the Usability Question: Crowdsourcing Mobile Apps’ Privacy Permission Settings , 2017, Proc. Priv. Enhancing Technol..

[22]  Todd Lingren,et al.  Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing , 2013, Journal of medical Internet research.

[23]  Samuel B. Pond,et al.  Using virtual presence and survey instructions to minimize careless responding on Internet-based surveys , 2015, Comput. Hum. Behav..

[24]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[25]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[26]  Hongwei Li,et al.  Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing , 2014, ArXiv.

[27]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[28]  Ming-Wei Chang,et al.  Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[29]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[30]  Jitendra Malik,et al.  Recognizing objects in adversarial clutter: breaking a visual CAPTCHA , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[31]  Lakshminarayanan Subramanian,et al.  Identifying Unreliable and Adversarial Workers in Crowdsourced Labeling Tasks , 2017, J. Mach. Learn. Res..

[32]  Dirk Homscheid,et al.  Crowdsourcing for Survey Research : where Amazon Mechanical Turks deviates from conventional survey methods , 2015 .

[33]  Ting Wang,et al.  TextBugger: Generating Adversarial Text Against Real-world Applications , 2018, NDSS.

[34]  Sanja Fidler,et al.  MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jason L. Huang,et al.  Detecting Insufficient Effort Responding with an Infrequency Scale: Evaluating Validity and Participant Reactions , 2014, Journal of Business and Psychology.

[36]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[37]  Sameep Mehta,et al.  Generating Adversarial Text Samples , 2018, ECIR.

[38]  Quoc V. Le,et al.  QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[39]  Scott R. Klemmer,et al.  Shepherding the crowd yields better work , 2012, CSCW.

[40]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[41]  Jason L. Huang,et al.  Who cares and who is careless? Insufficient effort responding as a reflection of respondent personality. , 2016, Journal of personality and social psychology.

[42]  Boualem Benatallah,et al.  Quality Control in Crowdsourcing , 2018, ACM Comput. Surv..

[43]  Adam J. Berinsky,et al.  Separating the Shirkers from the Workers? Making Sure Respondents Pay Attention on Self‐Administered Surveys , 2014 .

[44]  David J. Hauser,et al.  Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants , 2015, Behavior Research Methods.

[45]  Shuohang Wang,et al.  A Compare-Aggregate Model for Matching Text Sequences , 2016, ICLR.

[46]  Scott Clifford,et al.  Do Attempts to Improve Respondent Attention Increase Social Desirability Bias , 2015 .

[47]  Scott M. Smith,et al.  A multi-group analysis of online survey respondent data quality: Comparing a regular USA consumer panel to MTurk samples , 2016 .

[48]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[49]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[50]  Stefan Dietze,et al.  Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys , 2015, CHI.

[51]  Alex C. Williams,et al.  Deja Vu: Characterizing Worker Reliability Using Task Consistency , 2017, HCOMP.

[52]  C. Chabris,et al.  Common (Mis)Beliefs about Memory: A Replication and Comparison of Telephone and Mechanical Turk Survey Methods , 2012, PloS one.

[53]  Siu Cheung Hui,et al.  Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering , 2017, WSDM.

[54]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[55]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[56]  Steven L. Wise,et al.  Response Time Effort: A New Measure of Examinee Motivation in Computer-Based Tests , 2005 .

[57]  C. Kam,et al.  Examination of the validity of instructed response items in identifying careless respondents , 2018, Personality and Individual Differences.

[58]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[59]  W. Bruce Croft,et al.  aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model , 2016, CIKM.

[60]  William K. Robertson,et al.  Surveylance: Automatically Detecting Online Survey Scams , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[61]  P. Curran Methods for the detection of carelessly invalid responses in survey data , 2016 .

[62]  Navio Kwok,et al.  Are Attention Check Questions a Threat to Scale Validity? , 2018 .

[63]  John C. Mitchell,et al.  How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation , 2010, 2010 IEEE Symposium on Security and Privacy.

[64]  Chenglin Miao,et al.  Attack under Disguise: An Intelligent Data Poisoning Attack Mechanism in Crowdsourcing , 2018, WWW.

[65]  Ben Y. Zhao,et al.  Automated Crowdturfing Attacks and Defenses in Online Review Systems , 2017, CCS.

[66]  Jimmy J. Lin,et al.  Experiments with Convolutional Neural Network Models for Answer Selection , 2017, SIGIR.