A Summary of Survey Methodology Best Practices for Security and Privacy Researchers

“Given a choice between dancing pigs and security, users will pick dancing pigs every time,” warns an oft-cited quote from well-known security researcher Bruce Schneier [132]. This issue of understanding how to make security tools and mechanisms work better for humans (often categorized as usability, broadly construed) has become increasingly important over the past 17 years [7], [159], as illustrated by the growing body of research. Usable security and privacy research has improved our understanding of how to help users stay safe from phishing attacks [12], [62], [77], [105], [109], [129], [138], create strong passwords [39], [73], [130], [152], and control access to their accounts [16], [33], [93], [139], as just three examples. One key technique for understanding and improving how human decision making affects security is the gathering of self-reported data from users. This data is typically gathered via survey and interview studies, and serves to inform the broader security and privacy community about user needs, behaviors, and beliefs. The quality of this data, and the validity of subsequent research results, depends on the choices researchers make when designing their experiments. Contained here is a set of essential guidelines for conducting self-report usability studies distilled from prior work in survey methodology and related fields. Other fields that rely on selfreport data, such as the health and social sciences, have established guidelines and recommendations for collecting high quality self-report data [10], [42], [55], [57], [70], [82], [98], [103], [119], [136], [148], [149].

[1]  Daniel Corstange,et al.  Sensitive Questions, Truthful Answers? Modeling the List Experiment with LISTIT , 2009, Political Analysis.

[2]  R. Garland The Mid-Point on a Rating Scale: Is it Desirable? , 1991 .

[3]  Roger Tourangeau,et al.  What is sexual harassment? It depends on who asks! Framing effects on survey responses , 2007 .

[4]  G. Kalton,et al.  The treatment of missing survey data , 1986 .

[5]  D. Dillman,et al.  Mail and telephone surveys , 1978 .

[6]  M. Marshall Sampling for qualitative research. , 1996, Family practice.

[7]  Diana K. Smetters,et al.  How users use access control , 2009, SOUPS.

[8]  Terry S. Overton,et al.  Estimating Nonresponse Bias in Mail Surveys , 1977 .

[9]  F. J. Fowler,et al.  Standardized Survey Interviewing: Minimizing Interviewer-Related Error. , 1989 .

[10]  F. Conrad,et al.  Visual context effects in web surveys , 2007 .

[11]  Mark Ciampa,et al.  A comparison of password feedback mechanisms and their impact on password entropy , 2013, Inf. Manag. Comput. Secur..

[12]  Matthew N. Beckmann,et al.  What Leads to Voting Overreports? Contrasts of Overreporters to Validated Voters and Admitted Nonvoters in the American National Election Studies , 2001 .

[13]  Tracy Kelleher,et al.  Questionnaire order significantly increased response to a postal survey sent to primary care physicians. , 2008, Journal of clinical epidemiology.

[14]  L. Rips,et al.  The Psychology of Survey Response , 2000 .

[15]  Eszter Hargittai,et al.  Whose Space? Differences Among Users and Non-Users of Social Network Sites , 2007, J. Comput. Mediat. Commun..

[16]  Floyd J. Fowler,et al.  Reducing Interviewer‐Related Error Through Interviewer Training, Supervision, and Other Means , 2011 .

[17]  E. McColl Cognitive Interviewing. A Tool for Improving Questionnaire Design , 2006, Quality of Life Research.

[18]  F. J. Fowler,et al.  How unclear terms affect survey data. , 1992, Public opinion quarterly.

[19]  Zinta S. Byrne,et al.  The Psychology of Security for the Home Computer User , 2012, 2012 IEEE Symposium on Security and Privacy.

[20]  Joseph Bonneau,et al.  Learning Assigned Secrets for Unlocking Mobile Devices , 2015, SOUPS.

[21]  Howard Schuman,et al.  WHITE RESPONDENTS AND RACE-OF-INTERVIEWER EFFECTS , 1975 .

[22]  Peter V. Miller,et al.  Web Survey Methods Introduction , 2008 .

[23]  W. Belson,et al.  The effects of reversing the order of presentation of verbal rating scales in survey interviews , 1969 .

[24]  Serge Egelman,et al.  Scaling the Security Wall: Developing a Security Behavior Intentions Scale (SeBIS) , 2015, CHI.

[25]  Michael Hennessy,et al.  An Evaluation of the Validity and Reliability of Survey Response Data On Household Electricity Conservation , 1985 .

[26]  Linda B. Bourque,et al.  How to Conduct Self-Administered and Mail Surveys , 1995 .

[27]  Michael Fendrich,et al.  DIMINISHED LIFETIME SUBSTANCE USE OVER TIME: AN INQUIRY INTO DIFFERENTIAL UNDERREPORTING , 1994 .

[28]  Krista Casler,et al.  Separate but equal? A comparison of participants and data gathered via Amazon's MTurk, social media, and face-to-face behavioral testing , 2013, Comput. Hum. Behav..

[29]  Steven M. Bellovin,et al.  Facebook and privacy: it's complicated , 2012, SOUPS.

[30]  J. Sitzia,et al.  Good practice in the conduct and reporting of survey research. , 2003, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[31]  W. Graham,et al.  The importance of conducting and reporting pilot studies: the example of the Scottish Births Survey. , 2001, Journal of advanced nursing.

[32]  P. Chisnall Mail and Internet Surveys: The Tailored Design Method , 2007, Journal of Advertising Research.

[33]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[34]  Matthew Smith,et al.  Helping Johnny 2.0 to encrypt his Facebook conversations , 2012, SOUPS.

[35]  Nicolas Christin,et al.  It's All about the Benjamins: An Empirical Study on Incentivizing Users to Ignore Security Advice , 2011, Financial Cryptography.

[36]  F. Kreuter,et al.  Social Desirability Bias in CATI, IVR, and Web Surveys The Effects of Mode and Question Sensitivity , 2008 .

[37]  J C Duffy,et al.  Under-reporting of alcohol consumption in sample surveys: the effect of computer interviewing in fieldwork. , 1984, British journal of addiction.

[38]  Kosuke Imai,et al.  Design and Analysis of the Randomized Response Technique , 2015 .

[39]  Roger Tourangeau,et al.  Response Order Effects in Dichotomous Categorical Questions Presented Orally The Impact of Question and Respondent Attributes , 2007 .

[40]  I. Coyne Sampling in qualitative research. Purposeful and theoretical sampling; merging or clear boundaries? , 1997, Journal of advanced nursing.

[41]  Elissa M. Redmiles,et al.  How I Learned to be Secure: a Census-Representative Survey of Security Advice Sources and Behavior , 2016, CCS.

[42]  Dan R. Dalton,et al.  USING THE UNMATCHED COUNT TECHNIQUE (UCT) TO ESTIMATE BASE RATES FOR SENSITIVE BEHAVIOR , 1994 .

[43]  詹志禹 Response order effects in Likert-type scales , 1991 .

[44]  McKenney Nr,et al.  Issues regarding data on race and ethnicity: the Census Bureau experience. , 1994 .

[45]  Harry Hochheiser,et al.  Research Methods for Human-Computer Interaction , 2008 .

[46]  S. Presser,et al.  Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context , 1996 .

[47]  S. Day,et al.  Survey Questions: Handcrafting the Standardized Questionnaire. , 1987 .

[48]  J. Krosnick,et al.  Survey research. , 1999, Annual review of psychology.

[49]  Jennifer Preece,et al.  Electronic Survey Methodology: A Case Study in Reaching Hard-to-Involve Internet Users , 2003, Int. J. Hum. Comput. Interact..

[50]  James D. Wright,et al.  Handbook of Survey Research. , 1985 .

[51]  Zeynep Tufekci,et al.  Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls , 2014, ICWSM.

[52]  Martin Wetzels,et al.  Response Rate and Response Quality of Internet-Based Surveys: An Experimental Study , 2004 .

[53]  J. Hartley,et al.  The effects of changes in the order of verbal labels and numerical values on children’s scores on attitude and rating scales , 2012 .

[54]  Katerine Osatuke,et al.  Demographic Question Placement: Effect on Item Response Rates and Means of a Veterans Health Administration Survey , 2012 .

[55]  James Chromy,et al.  Impact of Interviewer Experience on Respondent Reports of Substance Use , 2002 .

[56]  Floyd J. Fowler,et al.  Improving Survey Questions: Design and Evaluation , 1995 .

[57]  P. Squire,et al.  WHY THE 1936 LITERARY DIGEST POLL FAILED , 1988 .

[58]  W. Belson,et al.  The effects of reversing the presentation order of verbal rating scales , 1966 .

[59]  Richard N. Landers,et al.  An Inconvenient Truth: Arbitrary Distinctions Between Organizational, Mechanical Turk, and Other Convenience Samples , 2015, Industrial and Organizational Psychology.

[60]  Amar Cheema,et al.  Data collection in a flat world: the strengths and weaknesses of mechanical turk samples , 2013 .

[61]  Elizabeth Martin,et al.  CONTEXT EFFECTS FOR CENSUS MEASURES OF RACE AND HISPANIC ORIGIN , 1990 .

[62]  Rick Wash,et al.  Too Much Knowledge? Security Beliefs and Protective Behaviors Among United States Internet Users , 2015, SOUPS.

[63]  P. Biernacki,et al.  Snowball Sampling: Problems and Techniques of Chain Referral Sampling , 1981 .

[64]  Barbara Mirel,et al.  Usability and hardcopy manuals: evaluating research designs and methods , 1990, SIGDOC '90.

[65]  Seymour Sudman,et al.  Improving Interview Method and Questionnaire Design: Response Effects to Threatening Questions in Survey Research. , 1980 .

[66]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[67]  A. Colman,et al.  Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. , 2000, Acta psychologica.

[68]  S. Berg Snowball Sampling—I , 2006 .

[69]  Sören Preibusch,et al.  Guide to measuring privacy concern: Review of survey and observational instruments , 2013, Int. J. Hum. Comput. Stud..

[70]  Roger Tourangeau,et al.  Eye-Tracking Data: New Insights on Response Order Effects and Other Cognitive Shortcuts in Survey Responding. , 2008, Public opinion quarterly.

[71]  David Ma,et al.  Does domain highlighting help people identify phishing sites? , 2011, CHI.

[72]  Ponnurangam Kumaraguru,et al.  Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions , 2010, CHI.

[73]  N. Schwarz Self-reports: How the questions shape the answers. , 1999 .

[74]  奥村 香保里,et al.  "Sleights of Privacy: Framing, Disclosures, and the Limits of Transparency"の紹介 , 2013 .

[75]  Jessica Staddon,et al.  Are privacy concerns a turn-off?: engagement and privacy in social networks , 2012, SOUPS.

[76]  G. Willis,et al.  Does Pretesting Make a Difference? An Experimental Test , 2004 .

[77]  M. Couper,et al.  Picture This!Exploring Visual Effects in Web Surveys , 2004 .

[78]  Sherri L. Jackson Research Methods and Statistics: A Critical Thinking Approach , 2005 .

[79]  W. Belson,et al.  The design and understanding of survey questions , 1982 .

[80]  Laura A. Dabbish,et al.  Privacy Attitudes of Mechanical Turk Workers and the U.S. Public , 2014, SOUPS.

[81]  M. Couper,et al.  Web Surveys , 2001 .

[82]  Pamela Campanelli,et al.  Testing Survey Questions: New Directions in Cognitive Interviewing , 1997 .

[83]  Eric Sundstrom,et al.  Questionnaire design, return rates, and response favorableness in an employee attitude questionnaire. , 1990 .

[84]  Stuart E. Schechter,et al.  The Emperor's New Security Indicators , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[85]  Maria Krysan,et al.  Privacy and the expression of white racial attitudes : A comparison across three contexts , 1998 .

[86]  Shari Lawrence Pfleeger,et al.  Principles of survey research: part 5: populations and samples , 2002, SOEN.

[87]  D. Collins Pretesting survey instruments: An overview of cognitive methods , 2003, Quality of Life Research.

[88]  Stefan A. Robila,et al.  Don't be a phish: steps in user education , 2006, ITICSE '06.

[89]  M. Angela Sasse,et al.  Users are not the enemy , 1999, CACM.

[90]  C. Bennett,et al.  Issues regarding data on race and ethnicity: the Census Bureau experience. , 1994, Public health reports.

[91]  Susan B. Barnes,et al.  A privacy paradox: Social networking in the United States , 2006, First Monday.

[92]  Seymour Sudman,et al.  Measurement errors in surveys , 1993 .

[93]  Chung-Ping Cheng,et al.  Effects of Response Order on Likert-Type Scales , 2000 .

[94]  Robert D. Tortora,et al.  Principles for Constructing Web Surveys , 1998 .

[95]  Patrick Gage Kelley Conducting Usable Privacy & Security Studies with Amazon ’ s Mechanical Turk , 2010 .

[96]  L. Jean Camp,et al.  Risk Communication Design: Video vs. Text , 2012, Privacy Enhancing Technologies.

[97]  Simson L. Garfinkel,et al.  Usable Security: History, Themes, and Challenges , 2014, Usable Security: History, Themes, and Challenges.

[98]  Lujo Bauer,et al.  A user study of policy creation in a flexible access-control system , 2008, CHI.

[99]  G. Albaum The Likert scale revisited: an alternate version , 1997 .

[100]  Ben Jann,et al.  Sensitive Questions in Online Surveys: Experimental Results for the Randomized Response Technique (RRT) and the Unmatched Count Technique (UCT) , 2011 .

[101]  H Storm [Demographic measures]. , 1984, Maandstatistiek van de bevolking.

[102]  Steve Love,et al.  A game design framework for avoiding phishing attacks , 2013, Comput. Hum. Behav..

[103]  Blase Ur,et al.  How Does Your Password Measure Up? The Effect of Strength Meters on Password Creation , 2012, USENIX Security Symposium.

[104]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[105]  A. Laurent,et al.  A summary of studies of interviewing methodology. , 1978, Vital and health statistics. Series 2, Data evaluation and methods research.

[106]  I B Pless,et al.  A comparison of observed and reported restraint use by children and adults. , 1985, Accident; analysis and prevention.

[107]  James Hartley,et al.  Some thoughts on Likert-type scales , 2014 .

[108]  Stanley Presser,et al.  SURVEY PRETESTING: DO DIFFERENT METHODS PRODUCE DIFFERENT RESULTS? , 1994 .

[109]  R. Tourangeau,et al.  Sensitive questions in surveys. , 2007, Psychological bulletin.

[110]  Tom W. Smith,et al.  ASKING SENSITIVE QUESTIONS THE IMPACT OF DATA COLLECTION MODE, QUESTION FORMAT, AND QUESTION CONTEXT , 1996 .

[111]  Maurice C. Bryson,et al.  The Literary Digest Poll: Making of a Statistical Myth , 1976 .

[112]  Kim Bartel Sheehan,et al.  E-mail Survey Response Rates: A Review , 2006, J. Comput. Mediat. Commun..

[113]  E. Kane,et al.  INTERVIEWER GENDER AND GENDER ATTITUDES , 1993 .

[114]  Floyd J. Fowler,et al.  Survey Research Methods , 1984 .

[115]  Frederick G. Conrad,et al.  Sample Size for Cognitive Interview Pretesting , 2011 .

[116]  Hugh J. Parry,et al.  Validity of Responses to Survey Questions , 1950 .

[117]  J. Doug Tygar,et al.  Why Johnny Can't Encrypt: A Usability Evaluation of PGP 5.0 , 1999, USENIX Security Symposium.

[118]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[119]  Luís Carriço,et al.  Snooping on Mobile Phones: Prevalence and Trends , 2016, SOUPS.

[120]  Adam Glynn What Can We Learn with Statistical Truth Serum?Design and Analysis of the List Experiment , 2013 .

[121]  Eszter Hargittai,et al.  Survey Measures of Web-Oriented Digital Literacy , 2005 .

[122]  Lorrie Faith Cranor,et al.  Teaching Johnny not to fall for phish , 2010, TOIT.

[123]  Jon A. Krosnick,et al.  Comparing the Quality of Data Obtained by Minimally Balanced and Fully Balanced Attitude Questions , 2005 .

[124]  Jon A. Krosnick,et al.  Research Synthesis AAPOR Report on Online Panels , 2010 .

[125]  Douglas Currivan,et al.  Methods for Testing and Evaluating Survey Questionnaires (review) , 2006 .

[126]  Elizabeth Martin,et al.  METHODS FOR TESTING AND EVALUATING SURVEY QUESTIONS , 2004 .

[127]  G. Willis,et al.  Research Synthesis: The Practice of Cognitive Interviewing , 2007 .

[128]  Robert W. Covert,et al.  Designing and Constructing Instruments for Social Research and Evaluation , 2007 .

[129]  H. Schuman,et al.  The Effect of the Question on Survey Responses: A Review , 1982 .

[130]  Xiang Cao,et al.  Intentional access management: making access control usable for end-users , 2006, SOUPS '06.

[131]  Masahiro Fujita,et al.  An Attempt to Memorize Strong Passwords while Playing Games , 2015, 2015 18th International Conference on Network-Based Information Systems.

[132]  Margaret Volante Qualitative research. , 2008, Nurse researcher.

[133]  B. Whitley Principles of research in behavioral science , 1996 .

[134]  Anne Johanne Søgaard,et al.  Health Study : The impact of self-selection in a large , population-based survey , 2015 .

[135]  Danna L. Moore,et al.  Measuring and Improving Telephone Interviewer Performance and Productivity , 2007 .

[136]  J. Conley Asking questions: A practical guide to questionnaire design , 1983 .

[137]  M. Traugott,et al.  Web survey design and administration. , 2001, Public opinion quarterly.

[138]  Fowler,et al.  Survey research methods, 2nd ed. , 2009 .

[139]  Ulf-Dietrich Reips,et al.  Financial Incentives, Personal Information and Drop Out in Online Studies , 2001 .

[140]  Beth L. Leech Asking Questions: Techniques for Semistructured Interviews , 2002, PS: Political Science & Politics.