Usability: Lessons Learned … and Yet to Be Learned

The philosopher of science J. W. Grove (1989) once wrote, “There is, of course, nothing strange or scandalous about divisions of opinion among scientists. This is a condition for scientific progress” (p. 133). Over the past 30 years, usability, both as a practice and as an emerging science, has had its share of controversies. It has inherited some from its early roots in experimental psychology, measurement, and statistics. Others have emerged as the field of usability has matured and extended into user-centered design and user experience. In many ways, a field of inquiry is shaped by its controversies. This article reviews some of the persistent controversies in the field of usability, starting with their history, then assessing their current status from the perspective of a pragmatic practitioner. Put another way: Over the past three decades, what are some of the key lessons we have learned, and what remains to be learned? Some of the key lessons learned are: • When discussing usability, it is important to distinguish between the goals and practices of summative and formative usability. • There is compelling rational and empirical support for the practice of iterative formative usability testing—it appears to be effective in improving both objective and perceived usability. • When conducting usability studies, practitioners should use one of the currently available standardized usability questionnaires. • Because “magic number” rules of thumb for sample size requirements for usability tests are optimal only under very specific conditions, practitioners should use the tools that are available to guide sample size estimation rather than relying on “magic numbers.”

[1]  Fred D. Davis,et al.  A critical assessment of potential measurement biases in the technology acceptance model: three experiments , 1996, Int. J. Hum. Comput. Stud..

[2]  Klaus Kaasgaard,et al.  Comparative usability evaluation , 2004, Behav. Inf. Technol..

[3]  Morten Hertzum,et al.  Cultural cognition in usability evaluation , 2009, Interact. Comput..

[4]  John D. Gould,et al.  The 1984 Olympic Message System: a test of behavioral principles of system design , 1987, CACM.

[5]  James R. Lewis,et al.  Integrated office software benchmarks: A case study , 1990, INTERACT.

[6]  J. Jackson Barnette,et al.  Effects of Stem and Likert Response Option Reversals on Survey Internal Consistency: If You Feel the Need, There is a Better Alternative to Using those Negatively Worded Stems , 2000 .

[7]  Cathleen Wharton,et al.  The cognitive walkthrough method: a practitioner's guide , 1994 .

[8]  Frederic M. Lord Further Comment on "Football Numbers". , 1954 .

[9]  Olli Pitkänen,et al.  Legal research topics in user-centric services , 2008, IBM Syst. J..

[10]  Vicente Moret-Bonillo,et al.  Usability: A Critical Analysis and a Taxonomy , 2009, Int. J. Hum. Comput. Interact..

[11]  Joseph S. Dumas,et al.  Making usability recommendations useful and usable , 2007 .

[12]  Tharon Howard Unexpected complexity in a traditional usability study , 2008 .

[13]  Debora Shaw,et al.  Handbook of usability testing: How to plan, design, and conduct effective tests , 1996 .

[14]  Miranda G. Capra Comparing Usability Problem Identification and Description by Practitioners and Students , 2007 .

[15]  Chris Marshall,et al.  Usability of product X-lessons from a real product , 1990 .

[16]  Martin Schmettow,et al.  Sample size in usability studies , 2012, Commun. ACM.

[17]  Janne Jul Jensen,et al.  A case study of three software projects: can software developers anticipate the usability problems in their software? , 2008, Behav. Inf. Technol..

[18]  S. Lilienfeld,et al.  The Scientific Status of Projective Techniques , 2000, Psychological science in the public interest : a journal of the American Psychological Society.

[19]  Elizabeth D. Murphy,et al.  Think-aloud protocols: a comparison of three think-aloud protocols for use in testing data-dissemination web sites for usability , 2010, CHI.

[20]  James R. Lewis,et al.  Psychometric Evaluation of the T-CSUQ: The Turkish Version of the Computer System Usability Questionnaire , 2013, Int. J. Hum. Comput. Interact..

[21]  Vanessa Evers,et al.  The Role of Culture in Interface Acceptance , 1997, INTERACT.

[22]  Richard B. Wright,et al.  Method Bias and Concurrent Verbal Protocol in Software Usability Testing , 1992 .

[23]  Stefan Wagner,et al.  A Comprehensive Model of Usability , 2007, EHCI/DS-VIS.

[24]  Morten Hertzum,et al.  Scrutinising usability evaluation: does thinking aloud affect behaviour and mental workload? , 2009, Behav. Inf. Technol..

[25]  Michael R Chernick,et al.  Bootstrap Methods: A Guide for Practitioners and Researchers , 2007 .

[26]  Eric Harslem,et al.  Designing the STAR User Interface , 1987, ECICS.

[27]  Kraig Finstad The system usability scale and non-native English speakers , 2006 .

[28]  James R. Lewis,et al.  A Slovene Translation of the System Usability Scale: The SUS-SI , 2015, Int. J. Hum. Comput. Interact..

[29]  Jeffrey Rubin,et al.  Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests , 1994 .


[31]  M. Chignell,et al.  Affective Interaction Understanding, Evaluating, and Designing for Human Emotion , 2011 .

[32]  Whitney Quesenbery,et al.  Towards the design of effective formative test reports , 2005 .

[33]  Neal Schmitt,et al.  Factors Defined by Negatively Keyed Items: The Result of Careless Respondents? , 1985 .

[34]  Morten Hertzum,et al.  Images of Usability , 2010, Int. J. Hum. Comput. Interact..

[35]  K. Leung,et al.  Personality in cultural context: methodological issues. , 2001, Journal of personality.

[36]  Alberto Sampaio Quantifying the user experience: practical statistics for user research by Jeff Sauro and James R. Lewis , 2013, SOEN.

[37]  Gregg Skip Bailey,et al.  Iterative methodology and designer training in human-computer interface design , 1993, INTERCHI.

[38]  W. R. Ford,et al.  Tutorials for the first-time computer user , 1981, IEEE Transactions on Professional Communication.

[39]  D. Broadbent,et al.  The role of instruction and verbalization in improving performance on complex search tasks , 1990 .

[40]  J. F. Kelley,et al.  An iterative design methodology for user-friendly natural language office information applications , 1984, TOIS.

[41]  Gavriel Salvendy,et al.  Number of people required for usability evaluation , 2010, Commun. ACM.

[42]  James R. Lewis Testing Small System Customer Set-Up , 1982 .

[43]  Ted Boren,et al.  Thinking aloud: reconciling theory and practice , 2000 .

[44]  Thomas K. Landauer,et al.  Behavioral Research Methods in Human-Computer Interaction , 1997 .

[45]  John L. Bennett,et al.  Usability Engineering: Our Experience and Evolution , 1988 .

[46]  Frederic M. Lord,et al.  On the Statistical Treatment of Football Numbers. , 1953 .

[47]  Jakob Nielsen,et al.  Usability , 2009 .

[48]  Clare-Marie Karat,et al.  Cost-Justifying Usability Engineering in the Software Life Cycle , 1997 .

[49]  Harry Hochheiser,et al.  Research Methods for Human-Computer Interaction , 2008 .

[50]  James R. Lewis Critical Review of 'The Usability Metric for User Experience' , 2013, Interact. Comput..

[51]  Chester A. Schriesheim,et al.  Controlling Acquiescence Response Bias by Item Reversals: The Effect on Questionnaire Validity , 1981 .

[52]  Leonard R. Sussman,et al.  Nominal, Ordinal, Interval, and Ratio Typologies are Misleading , 1993 .

[53]  Peter J. Kennedy Development and Testing of the Operator Training Package for a Small Computer System , 1982 .

[54]  Javad Sadeghi Cost-Justifying Usability , 2007 .

[55]  Morten Hertzum,et al.  Problem Prioritization in Usability Evaluation: From Severity Assessments Toward Impact on Design , 2006, Int. J. Hum. Comput. Interact..

[56]  Jeff Sauro,et al.  Estimating Completion Rates from Small Samples Using Binomial Confidence Intervals: Comparisons and Recommendations , 2005 .

[57]  Leslie Beth Herbert,et al.  A Comparison of Three Usability Evaluation Methods: Heuristic, Think-Aloud, and Performance Testing , 1993 .

[58]  Aaron Marcus,et al.  Global and intercultural user-interface design , 2002 .

[59]  Yuji Matsumoto,et al.  Opinion mining from web documents: extraction and structurization (論文特集:データマイニングと統計数理) , 2007 .

[60]  Daniel M. Wildman Getting the most from paired-user testing , 1995, INTR.

[61]  Ronald Baecker,et al.  TIMELINESThemes in the early history of HCI---some unanswered questions , 2008, Interactions.

[62]  Thomas S. Tullis,et al.  A Comparison of Questionnaires for Assessing Website Usability , 2004 .

[63]  Angela M. Cirucci,et al.  Usability Testing , 2021, UX Research Methods for Media and Communication Studies.

[64]  Rebecca A. Grier,et al.  The System Usability Scale , 2013 .

[65]  D. Borsboom,et al.  A reanalysis of Lord's statistical treatment of football numbers , 2009 .

[66]  Stephen L. Vargo,et al.  Toward a conceptual foundation for service science: Contributions from service-dominant logic , 2008, IBM Syst. J..

[67]  Anshu Agarwal,et al.  Beyond usability: evaluating emotional response as an integral part of the user experience , 2009, CHI Extended Abstracts.

[68]  Wendy Howard,et al.  Unexpected complexity in user testing of information products , 2009, 2009 IEEE International Professional Communication Conference.

[69]  James R. Lewis Psychometric Evaluation of the Post-Study System Usability Questionnaire: The PSSUQ , 1992 .

[70]  Kent L. Norman,et al.  Development of an instrument measuring user satisfaction of the human-computer interface , 1988, CHI '88.

[71]  Joseph S. Dumas,et al.  User-based evaluations , 2002 .

[72]  Jakob Nielsen,et al.  Heuristic evaluation of user interfaces , 1990, CHI '90.

[73]  James R. Lewis,et al.  Multipoint scales: Mean and median differences and observed significance levels , 1993, Int. J. Hum. Comput. Interact..

[74]  Kasper Hornbæk,et al.  Meta-analysis of correlations among usability measures , 2007, CHI.

[75]  Gilbert Cockton,et al.  Why and when five test users aren’t enough , 2001 .

[76]  Jurek Kirakowski,et al.  The Software Usability Measurement Inventory: Background and Usage , 1996 .

[77]  James T. Miller,et al.  An Empirical Evaluation of the System Usability Scale , 2008, Int. J. Hum. Comput. Interact..

[78]  Muzakir Saifful Kamaluddin,et al.  Ibn al-Haytham, from Place to Space: A Comparative Approach , 2019, Philosophy East and West.

[79]  Tingting Zhao,et al.  Exploring Think-Alouds in Usability Testing: An International Survey , 2012, IEEE Transactions on Professional Communication.

[80]  James R. Lewis,et al.  IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use , 1995, Int. J. Hum. Comput. Interact..

[81]  James T. Townsend,et al.  Measurement Scales and Statistics: The Misconception Misconceived , 1984 .

[82]  A. Frye,et al.  Investigating the Use of Negatively Phrased Survey Items in Medical Education Settings: Common Wisdom or Common Mistake? , 2004, Academic medicine : journal of the Association of American Medical Colleges.

[83]  Carol M. Barnum Usability Testing Essentials: Ready, Set...Test! , 2010 .

[84]  Jeff Sauro,et al.  The Factor Structure of the System Usability Scale , 2009, HCI.

[85]  Robert K. Gable,et al.  The Impact of Positive and Negative Item Stems on the Validity of a Computer Anxiety Scale , 1990 .

[86]  James V. Bradley,et al.  Probability, decision, statistics , 1976 .

[87]  Chester A. Schriesheim,et al.  The Effect of Grouping or Randomizing Items on Leniency Response Bias , 1981 .

[88]  K. A. Ericsson,et al.  Verbal reports as data. , 1980 .

[89]  P. Maglio,et al.  The Emergence of Service Science: Toward Systematic Service Innovations to Accelerate Co‐Creation of Value , 2008 .

[90]  See Anthropometry HUMAN FACTORS ENGINEERING , 2011 .

[91]  J. B. Brooke,et al.  SUS: a retrospective , 2013 .

[92]  James R. Lewis,et al.  Practical Speech User Interface Design , 2010 .

[93]  S. Dunn Attitudes Can Be Measured , 1988 .

[94]  Kasper Hornbæk,et al.  Non-universal usability?: a survey of how usability is understood by Chinese and Danish users , 2009, CHI.

[95]  John D. Gould,et al.  Human factors challenges in creating a principal support office system—the speech filing system approach , 1983, TOIS.

[96]  Jo Wood,et al.  On the reliability of usability testing , 2001, CHI Extended Abstracts.

[97]  Fred D. Davis Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology , 1989, MIS Q..

[98]  Rex B. Kline,et al.  Usability measurement and metrics: A consolidated model , 2006, Software Quality Journal.

[99]  Harold W. Thimbleby,et al.  User-Centered Methods Are Insufficient for Safety Critical Systems , 2007, USAB.

[100]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[101]  Rick Spencer,et al.  The streamlined cognitive walkthrough method, working around social constraints encountered in a software development company , 2000, CHI.

[102]  Kraig Finstad,et al.  The Usability Metric for User Experience , 2010, Interact. Comput..

[103]  J. Gaito Measurement scales and statistics: Resurgence of an old misconception. , 1980 .

[104]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[105]  Joseph S. Dumas,et al.  The great leap forward: the birth of the usability profession (1988-1993) , 2007 .

[106]  Lionel C. Briand,et al.  A Comprehensive Evaluation of Capture-Recapture Models for Estimating Software Defect Content , 2000, IEEE Trans. Software Eng..

[107]  R. Abelson Statistics As Principled Argument , 1995 .

[108]  Anne Marsden,et al.  International Organization for Standardization , 2014 .

[109]  J. Preece,et al.  The Human-Computer Interaction Handbook , 2003 .

[110]  John D. Gould Chapter 35 – How to Design Usable Systems , 1988 .

[111]  J R Lewis,et al.  Sample Sizes for Usability Studies: Additional Considerations , 1994, Human factors.

[112]  Stephen L. Vargo,et al.  Competing through service: Insights from service-dominant logic , 2007 .

[113]  Ergonomic requirements for office work with visual display terminals ( VDTs ) — Part 11 : Guidance on usability , 1998 .

[114]  Jeff Sauro,et al.  When designing usability questionnaires, does it hurt to be positive? , 2011, CHI.

[115]  Sharon McDonald,et al.  Thinking-aloud about web navigation , 2013 .

[116]  J. Michell Measurement scales and statistics: A clash of paradigms. , 1986 .

[117]  Karel Vredenburg,et al.  A survey of user-centered design practice , 2002, CHI.

[118]  Robert A. Virzi Streamlining the Design Process: Running Fewer Subjects , 1990 .

[119]  Jacob Cohen,et al.  THINGS I HAVE LEARNED (SO FAR) , 1990 .

[120]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[121]  Jan Stage,et al.  The Impact of Usability Reports and User Test Observations on Developers' Understanding of Usability Data: An Exploratory Study , 2006, Int. J. Hum. Comput. Interact..

[122]  E. Krahmer,et al.  Thinking about thinking aloud: a comparison of two verbal protocols for usability testing , 2004, IEEE Transactions on Professional Communication.

[123]  Richard J. Harris A primer of multivariate statistics , 1975 .

[124]  Marc Hassenzahl,et al.  The Interplay of Beauty, Goodness, and Usability in Interactive Products , 2004, Hum. Comput. Interact..

[125]  Nigel Bevan,et al.  Extending Quality in Use to Provide a Framework for Usability Measurement , 2009, HCI.

[126]  Gitte Lindgaard,et al.  Introduction to the Special Issue: The Tricky Landscape of Developing Rating Scales in HCI , 2013, Interact. Comput..

[127]  J. Nunnally,et al.  Psychometric Theory: NY. , 1978 .

[128]  Martin Schmettow,et al.  Controlling the usability evaluation process under varying defect visibility , 2009, BCS HCI.

[129]  Kasper Hornbæk,et al.  Dogmas in the assessment of usability evaluation methods , 2010, Behav. Inf. Technol..

[130]  James R. Lewis,et al.  Evaluation of Procedures for Adjusting Problem-Discovery Rates Estimated From Small Samples , 2001, Int. J. Hum. Comput. Interact..

[131]  Jeff Sauro,et al.  Correlations among prototypical usability metrics: evidence for the construct of usability , 2009, CHI.

[132]  M. Furlong,et al.  Eight Was Not Enough , 2009 .

[133]  Kasper Hornbæk,et al.  Current practice in measuring usability: Challenges to usability studies and research , 2006, Int. J. Hum. Comput. Stud..

[134]  I. Campbell Chi‐squared and Fisher–Irwin tests of two‐by‐two tables with small sample recommendations , 2007, Statistics in medicine.

[135]  Dennis R. Wixon Evaluating usability methods: why the current literature fails the practitioner , 2003, INTR.

[136]  Victoria A. Bowers Concurrent versus Retrospective Verbal Protocol for Comparing Window Usability , 1990 .

[137]  Thomas S. Tullis,et al.  Designing a menu-based interface to an operating system , 1985, CHI '85.

[138]  Kraig Finstad,et al.  Response to commentaries on 'The Usability Metric for User Experience' , 2013, Interact. Comput..

[139]  W. C. Howell of the Human Factors and Ergonomics Society , 2010 .

[140]  Martin Schmettow Heterogeneity in the usability evaluation process , 2008 .

[141]  Robert A. Virzi,et al.  Refining the Test Phase of Usability Evaluation: How Many Subjects Is Enough? , 1992 .

[142]  M. Cowles Statistics in Psychology: An Historical Perspective , 1989 .

[143]  Patrick T. Harker,et al.  Customer Efficiency , 2002 .

[144]  Morten Hertzum,et al.  Usability Constructs: A Cross-Cultural Study of How Users and Developers Experience Their Use of Information Systems , 2007, HCI.

[145]  Marc Hassenzahl Prioritizing usability problems: Data-driven and judgement-driven severity estimates , 2000, Behav. Inf. Technol..

[146]  Janice Ginny Redish,et al.  Expanding usability testing to evaluate complex systems , 2007 .

[147]  Martin Schmettow,et al.  Heterogeneity in the usability evaluation process , 2008, BCS HCI.

[148]  Ivo Schneider Statistics on the table: The history of statistical concepts and methods , 2005 .

[149]  Stefano Federici,et al.  On the dimensionality of the System Usability Scale: a test of alternative measurement models , 2009, Cognitive Processing.

[150]  Joseph S. Dumas,et al.  Comparative usability evaluation (CUE-4) , 2008, Behav. Inf. Technol..

[151]  R LewisJames Psychometric evaluation of an after-scenario questionnaire for computer usability studies , 1991 .

[152]  John D. Gould,et al.  How to design usable systems , 1995 .

[153]  Lars Schmidt,et al.  Comparative evaluation of usability tests , 1999, CHI Extended Abstracts.

[154]  James R. Lewis Tradeoffs in the Design of the IBM Computer Usability Satisfaction Questionnaires , 1999, HCI.

[155]  James E. Burroughs,et al.  Do Reverse-Worded Items Confound Measures in Cross-Cultural Consumer Research? The Case of the Material Values Scale , 2003 .

[156]  Joseph B. Sidowski,et al.  Measurements of computer satisfaction, literacy, and aptitudes: A review , 1990, Int. J. Hum. Comput. Interact..

[157]  James R. Lewis,et al.  Sample sizes for usability tests: mostly math, not magic , 2006, INTR.

[158]  H Kanis Estimating the number of usability problems. , 2011, Applied ergonomics.

[159]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[160]  Philip T. Kortum,et al.  Usability Ratings for Everyday Products Measured With the System Usability Scale , 2013, Int. J. Hum. Comput. Interact..

[161]  James R. Lewis,et al.  UMUX-LITE: when there's no time for the SUS , 2013, CHI.

[162]  Richard C. Larson,et al.  Service science: At the intersection of management, social, and engineering sciences , 2008, IBM Syst. J..

[163]  Stefano Federici,et al.  The Bootstrap Discovery Behaviour (BDB): a new outlook on usability evaluation , 2011, Cognitive Processing.

[164]  Jakob Nielsen,et al.  A mathematical model of the finding of usability problems , 1993, INTERCHI.

[165]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[166]  Wayne D. Gray,et al.  Damaged Merchandise? A Review of Experiments That Compare Usability Evaluation Methods , 1998, Hum. Comput. Interact..

[167]  Jared M. Spool,et al.  Testing web sites: five users is nowhere near enough , 2001, CHI Extended Abstracts.

[168]  Paul E. Spector,et al.  When Two Factors Don’t Reflect Two Constructs: How Item Characteristics Can Produce Artifactual Factors , 1997 .

[169]  Mary Corbett,et al.  SUMI: the Software Usability Measurement Inventory , 1993, Br. J. Educ. Technol..

[170]  Jack Grove In Defence of Science: Science, Technology, and Politics in Modern Society , 1989 .

[171]  James R. Lewis,et al.  Psychometric evaluation of an after-scenario questionnaire for computer usability studies: the ASQ , 1991, SGCH.

[172]  Marc Hassenzahl,et al.  The Effect of Perceived Hedonic Quality on Product Appealingness , 2001, Int. J. Hum. Comput. Interact..

[173]  Jean Scholtz,et al.  Common industry format for usability test reports , 2000, CHI Extended Abstracts.

[174]  Ann Heylighen,et al.  How relative absolute can be: SUMI and the impact of the nature of the task in measuring perceived software usability , 2007, AI & SOCIETY.

[175]  James R. Lewis,et al.  Using Cognitive Models to Create Menus , 1985 .

[176]  Robert W. Bailey,et al.  Usability Testing vs. Heuristic Evaluation: A Head-to-Head Comparison , 1992 .

[177]  Jennifer L. Martin,et al.  Reviewing and Extending the Five-User Assumption: A Grounded Procedure for Interaction Evaluation , 2013, TCHI.

[178]  Kasper Hornbæk,et al.  Exploring the Value of Usability Feedback Formats , 2009, Int. J. Hum. Comput. Interact..

[179]  M.D.T. de Jong,et al.  Exploring two methods of usability testing: concurrent versus retrospective think-aloud protocols , 2003 .

[180]  James R. Lewis,et al.  Psychometric Evaluation of the PSSUQ Using Data from Five Years of Usability Studies , 2002, Int. J. Hum. Comput. Interact..