A classification of response scale characteristics that affect data quality: a literature review

Quite a lot of research is available on the relationships between survey response scales’ characteristics and the quality of responses. However, it is often difficult to extract practical rules for questionnaire design from the wide and often mixed amount of empirical evidence. The aim of this study is to provide first a classification of the characteristics of response scales, mentioned in the literature, that should be considered when developing a scale, and second a summary of the main conclusions extracted from the literature regarding the impact these characteristics have on data quality. Thus, this paper provides an updated and detailed classification of the design decisions that matter in questionnaire development, and a summary of what is said in the literature about their impact on data quality. It distinguishes between characteristics that have been demonstrated to have an impact, characteristics for which the impact has not been found, and characteristics for which research is still needed to make a conclusion.

[1]  Terance D. Miethe The Validity and Reliability of Value Measurements , 1985 .

[2]  Frederick G. Conrad,et al.  An experiment testing six formats of 101-point rating scales , 2016, Comput. Hum. Behav..

[3]  Willem E. Saris,et al.  Choosing the Number of Categories in Agree–Disagree Scales , 2014 .

[4]  Roger Tourangeau,et al.  Evaluating the Effectiveness of Visual Analog Scales , 2006 .

[5]  Joop J. Hox,et al.  Handling Do-Not-Know Answers , 2016 .

[6]  J. Jacoby,et al.  Three-Point Likert Scales Are Good Enough , 1971 .

[7]  M. Mcclendon,et al.  Acquiescence and Recency Response-Order Effects in Interview Surveys , 1991 .

[8]  Frederik Funke,et al.  A Web Experiment Showing Negative Effects of Slider Scales Compared to Visual Analogue Scales and Radio Button Scales , 2016 .

[9]  J. Conley Asking questions: A practical guide to questionnaire design , 1983 .

[10]  M. Traugott,et al.  Web survey design and administration. , 2001, Public opinion quarterly.

[11]  Jaak Billiet,et al.  Modeling Acquiescence in Measurement Models for Two Balanced Sets of Items , 2000 .

[12]  Ulf-Dietrich Reips,et al.  Why Semantic Differentials in Web-Based Research Should Be Made from Visual Analogue Scales and Not from 5-Point Scales , 2012 .

[13]  Sebastian Lundmark,et al.  Measuring Generalized Trust: An Examination of Question Wording and the Number of Scale Points. , 2016, Public opinion quarterly.

[14]  Lewis R. Aiken,et al.  Number of Response Categories and Statistics on a Teacher Rating Scale , 1983 .

[15]  Eli P. Cox,et al.  The Optimal Number of Response Alternatives for a Scale: A Review , 1980 .

[16]  Marek Fuchs,et al.  Exploring Animated Faces Scales in Web Surveys: Drawbacks and Prospects , 2012 .

[17]  J. Krosnick,et al.  Comparing questions with agree/ disagree response options to questions with item-specific response options , 2010 .

[18]  J. Krosnick,et al.  Survey research. , 1999, Annual review of psychology.

[19]  G. F. Bishop,et al.  EXPERIMENTS WITH THE MIDDLE RESPONSE ALTERNATIVE IN SURVEY QUESTIONS , 1987 .

[20]  D. Alwin,et al.  Feeling Thermometers Versus 7-Point Scales , 1997 .

[21]  Hershey H. Friedman,et al.  Do Numeric Values Influence Subjects’ Responses to Rating Scales? , 2001 .

[22]  S. McKelvie,et al.  Graphic rating scales — How many categories? , 1978 .

[23]  M. Revilla,et al.  Quality of Different Scales in an Online Survey in Mexico and Colombia , 2015 .

[24]  N. Menold,et al.  How Do Respondents Attend to Verbal Labels in Rating Scales? , 2014 .

[25]  Bruce Thompson,et al.  Score Reliability in Webor Internet-Based Surveys: Unnumbered Graphic Rating Scales versus Likert-Type Scales , 2001 .

[26]  M. Revilla Effect of Using Different Labels for the Scales in a Web Survey , 2015 .

[27]  John R. Rossiter,et al.  Measurement for the Social Sciences: The C-OAR-SE Method and Why It Must Replace Psychometrics , 2010 .

[28]  Norbert Schwarz,et al.  Response Scales: Effects of Category Range on Reported Behavior and Comparative Judgments , 1985 .

[29]  J. Krosnick,et al.  Optimal Design of Branching Questions to Measure Bipolar Constructs , 2009 .

[30]  N. C. Schaeffer,et al.  The Science of Asking Questions , 2003 .

[31]  F. M. Andrews Construct Validity and Error Components of Survey Measures: A Structural Modeling Approach , 1984 .

[32]  Ulf-Dietrich Reips,et al.  Context Effects in Web Surveys , 2002 .

[33]  J. Krosnick The Stability of Political Preferences: Comparisons of Symbolic and Nonsymbolic Attitudes , 1991 .

[34]  Willem E. Saris,et al.  Correction for Measurement Errors in Survey Research: Necessary and Possible , 2016 .

[35]  Patrick Sturgis,et al.  Middle Alternatives Revisited , 2010 .

[36]  T. Baghal Numeric Estimation and Response Options: An Examination of the Accuracy of Numeric and Vague Quantifier Responses , 2014 .

[37]  M. Larsen,et al.  The Psychology of Survey Response , 2002 .

[38]  J. Jacoby,et al.  Is There an Optimal Number of Alternatives for Likert Scale Items? Study I: Reliability and Validity , 1971 .

[39]  Kristin L. K. Koskey,et al.  An experimental study using Rasch analysis to compare absolute magnitude estimation and categorical rating scaling as applied in survey research. , 2013, Journal of applied measurement.

[40]  S. Presser,et al.  Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context , 1996 .

[41]  D. Krebs,et al.  Positive First or Negative First , 2010 .

[42]  Jolene D. Smyth,et al.  Helping Respondents Get It Right the First Time: The Influence of Words, Symbols, and Graphics in Web Surveys , 2007 .

[43]  Annette Scherpenzeel,et al.  The Validity and Reliability of Survey Questions , 1997 .

[44]  Ulf-Dietrich Reips,et al.  Sliders for the Smart: Type of Rating Scale on the Web Interacts With Educational Level , 2011 .

[45]  Roger Tourangeau,et al.  Evaluating the Effectiveness of Visual Analog Scales : A Web Experiment , 2006 .

[46]  D. Dillman,et al.  International handbook of survey methodology. , 2008 .

[47]  Beatrice Rammstedt,et al.  Does Response Scale Format Affect the Answering of Personality Scales , 2007 .

[48]  D. Alwin,et al.  No-Opinion Filters and Attitude Measurement Reliability , 1993 .

[49]  Jon A. Krosnick,et al.  Comparisons of Party Identification and Policy Preferences: The Impact of Survey Question Format , 1993 .

[50]  Don A. Dillman,et al.  Designing Scalar Questions for Web Surveys , 2009 .

[51]  C. Currie,et al.  Reliability and Validity of an Adapted Version of the Cantril Ladder for Use with Adolescent Samples , 2014 .

[52]  A. Colman,et al.  Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. , 2000, Acta psychologica.

[53]  Theodore Kunin The Construction of a New Type of Attitude Measure , 1955 .

[54]  Jeroen K. Vermunt,et al.  The Effect of Labeling and Numbering of Response Scales on the Likelihood of Response Bias , 2014 .

[55]  Willem E. Saris,et al.  Design, Evaluation, and Analysis of Questionnaires for Survey Research: Saris/Design , 2007 .

[56]  Frank M. Andrews,et al.  Social Indicators of Well-Being , 1976 .

[57]  Jared Eutsler,et al.  Rating Scales in Accounting Research: The Impact of Scale Points and Labels , 2015 .

[58]  Tanja Kunz Rating scales in Web surveys. A test of new drag-and-drop rating procedures , 2015 .

[59]  Jolene D. Smyth,et al.  The Effects of Mode and Format on Answers to Scalar Questions in Telephone and Web Surveys , 2007 .

[60]  A. W. Bendig Reliability and the number of rating-scale categories. , 1954 .

[61]  John T. Kulas,et al.  Middle category endorsement in odd-numbered Likert response scales: Associated item characteristics, cognitive demands, and preferred meanings , 2009 .

[62]  Mikael Gilljam,et al.  SHOULD WE TAKE DON'T KNOW FOR AN ANSWER? , 1993 .

[63]  B. Weijters,et al.  The effect of rating scale format on response styles: the number of response categories and response catgory labels , 2010 .

[64]  Duane F. Alwin Margins of Error: A Study of Reliability in Survey Measurement , 2007 .

[65]  John P. Robinson,et al.  Questions and answers in attitude surveys , 1982 .

[66]  Vera Toepoel,et al.  Design of Web Questionnaires: The Effect of Layout in Rating Scales , 2006 .

[67]  F. Conrad,et al.  Color, Labels, and Interpretive Heuristics for Response Scales , 2007 .

[68]  Hans-Jürgen Hippler,et al.  The numeric values of rating scales: a comparison of their impact in mail surveys and telephone interviews , 1995 .

[69]  T. Baghal Is Vague Valid? The Comparative Predictive Validity of Vague Quantifiers and Numeric Response Options , 2014 .

[70]  Norval F. Pohl Scale Considerations in Using Vague Quantifiers. , 1981 .

[71]  Robert Cameron Mitchell,et al.  The Impact of "No Opinion" Response Options on Data Quality: Non-Attitude Reduction or an Invitation to Satisfice? , 2001 .

[72]  Trent D. Buskirk,et al.  Are Sliders Too Slick for Surveys? An Experiment Comparing Slider and Radio Button Scales for Smartphone, Tablet and Computer Based Surveys , 2015 .

[73]  F. Conrad,et al.  Spacing, Position, and Order Interpretive Heuristics for Visual Features of Survey Questions , 2004 .

[74]  S. Dolnicar Asking Good Survey Questions , 2013 .

[75]  Leslie F. Clark,et al.  RATING SCALES NUMERIC VALUES MAY CHANGE THE MEANING OF SCALE LABELS , 1991 .

[76]  F. M. Andrews,et al.  The validity of measures of self-reported well-being , 1976 .

[77]  Lorenzo Lucianetti,et al.  Exploring Slider vs. Categorical Response Formats in Web-Based Surveys , 2015 .

[78]  N. C. Schaeffer,et al.  Respondent Behavior in Magnitude Estimation , 1989 .

[79]  Jon A. Krosnick,et al.  Designing Rating Scales for Effective Measurement in Surveys , 1997 .

[80]  L. L. Thurstone,et al.  The Measurement of Attitudes. , 1950 .

[81]  Roger Tourangeau,et al.  What They See Is What We Get , 2004 .

[82]  Norbert Schwarz,et al.  FORMAL FEATURES OF RATING SCALES AND THE INTERPRETATION OF QUESTION MEANING , 1998 .

[83]  Augusto Caraceni,et al.  Studies comparing Numerical Rating Scales, Verbal Rating Scales, and Visual Analogue Scales for assessment of pain intensity in adults: a systematic literature review. , 2011, Journal of pain and symptom management.

[84]  Frank M. Andrews,et al.  Social Indicators of Well-Being: Americans' Perceptions of Life Quality , 1976 .

[85]  Willem E. Saris,et al.  Variation in response functions : a source of measurement error in attitude research , 1988 .

[86]  Jon A. Krosnick,et al.  The Reliability of Survey Attitude Measurement , 1991 .

[87]  Gavin T. L. Brown Measuring Attitude with Positively Packed Self-Report Ratings: Comparison of Agreement and Frequency Scales , 2004, Psychological reports.

[88]  S. S. Komorita,et al.  Number of Scale Points and the Reliability of Scales , 1965 .

[89]  Willem E. Saris,et al.  Design, Evaluation, and Analysis of Questionnaires for Survey Research , 2007 .

[90]  Ulf-Dietrich Reips,et al.  Interval-level measurement with visual analogue scales in Internet-based research: VAS Generator , 2008, Behavior research methods.

[91]  Nora Cate Schaeffer,et al.  HARDLY EVER OR CONSTANTLY? GROUP COMPARISONS USING VAGUE QUANTIFIERS , 1991 .