Evaluating website quality: Five studies on user-focused evaluation methods

The benefits of evaluating websites among potential users are widely acknowledged. There are several methods that can be used to evaluate the websites’ quality from a users’ perspective. In current practice, many evaluations are executed with inadequate methods that lack research-based validation. This thesis aims to gain more insight into evaluation methodology and to contribute to a higher standard of website evaluation in practice. A first way to evaluate website quality is measuring the users’ opinions. This is often done with questionnaires, which gather opinions in a cheap, fast, and easy way. However, many questionnaires seem to miss a solid statistical basis and a justification of the choice of quality dimensions and questions. We therefore developed the ‘Website Evaluation Questionnaire’ (WEQ), which was specifically designed for the evaluation of governmental websites. In a study in online and laboratory settings the WEQ has proved to be a valid and reliable instrument. A way to gather more specific user opinions, is inviting participants to review website pages. Participants provide their comments by clicking on a feedback button, marking a problematic segment, and formulating their feedback. There has been debate about the extent to which users are able to provide relevant feedback. The results of our studies showed that participants were able to provide useful feedback. They signalled many relevant problems that indeed were experienced by users who needed to find information on the website. Website quality can also be measured during participants’ task performance. A frequently used method is the concurrent think-aloud method (CTA), which involves participants who verbalize their thoughts while performing tasks. There have been doubts on the usefulness and exhaustiveness of participants’ verbalizations. Therefore, we have combined CTA and eye tracking in order to examine the cognitive processes that participants do and do not verbalize. The results showed that the participants’ verbalizations provided substantial information in addition to the directly observable user problems. There was also a rather high percentage of silences (27%) during which interesting observations could be made about the users’ processes and obstacles. A thorough evaluation should therefore combine verbalizations and (eye tracking) observations. In a retrospective think-aloud (RTA) evaluation participants verbalize their thoughts afterwards while watching a recording of their performance. A problem with RTA is that participants not always remember the thoughts they had during their task performance. We therefore complemented the dynamic screen replay of their actions (pages visited and mouse movements) with a dynamic gaze replay of the participants’ eye movements. Contrary to our expectations, no differences were found between the two conditions. It is not possible to draw conclusions on the single best method. The value of a specific method is strongly influenced by the goals and context of an evaluation. Also, the outcomes of the evaluation not only depend on the method, but also on other choices during the evaluation, such as participant selection, tasks, and the subsequent analysis.

[1]  Tingting Zhao,et al.  Keep talking: an analysis of participant utterances gathered using two concurrent think-aloud methods , 2010, NordiCHI.

[2]  Morten Hertzum,et al.  Usability inspections by groups of specialists: perceived agreement in spite of disparate observations , 2002, CHI Extended Abstracts.

[3]  Kasper Hornbæk,et al.  Dogmas in the assessment of usability evaluation methods , 2010, Behav. Inf. Technol..

[4]  M. Larsen,et al.  The Psychology of Survey Response , 2002 .

[5]  Roger Tourangeau,et al.  Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines. , 1987 .

[6]  Gregoris Mentzas,et al.  An Ontology for the Multi-perspective Evaluation of Quality in E-Government Services , 2007, EGOV.

[7]  N. Ummelen,et al.  Measuring reading behavior in policy documents: a comparison of two instruments , 2000 .

[8]  Linden J. Ball,et al.  Eye Tracking in Human-Computer Interaction and Usability Research : Current Status and Future Prospects , 2004 .

[9]  Menno D.T. de Jong,et al.  Users’ Abilities to Review Web Site Pages , 2012 .

[10]  Kasper Hornbæk,et al.  Ingredients and Meals Rather Than Recipes: A Proposal for Research That Does Not Treat Usability Evaluation Methods as Indivisible Wholes , 2011, Int. J. Hum. Comput. Interact..

[11]  Marilyn Hughes Blackmon,et al.  A Comprehension-based Model of Web Navigation and Its Application to Web Usability Analysis , 2000, BCS HCI.

[12]  E. Krahmer,et al.  Thinking about thinking aloud: a comparison of two verbal protocols for usability testing , 2004, IEEE Transactions on Professional Communication.

[13]  Daniel Cunliffe,et al.  Developing usable Web sites - a review and model , 2000, Internet Res..

[14]  G. D. Logan Task Switching , 2022 .

[15]  Menno D. T. de Jong,et al.  Evaluating municipal websites: A methodological comparison of three think-aloud variants , 2009, Gov. Inf. Q..

[16]  Helmut Krcmar,et al.  The Role of Trust in E-Government Adoption: A Literature Review , 2010, AMCIS.

[17]  H. Bergh,et al.  Multiple Group Confirmatory Factor Analysis of the Young Schema-Questionnaire in a Dutch Clinical versus Non-clinical Population , 2006, Cognitive Therapy and Research.

[18]  Kasper Hornbæk,et al.  Current practice in measuring usability: Challenges to usability studies and research , 2006, Int. J. Hum. Comput. Stud..

[19]  Joseph S. Dumas,et al.  Comparison of three one-question, post-task usability questionnaires , 2009, CHI.

[20]  Morten Hertzum,et al.  The Evaluator Effect: A Chilling Fact About Usability Evaluation Methods , 2001, Int. J. Hum. Comput. Interact..

[21]  L. Baker Metacognition, comprehension monitoring, and the adult reader , 1989 .

[22]  Joseph S. Dumas,et al.  Comparative usability evaluation (CUE-4) , 2008, Behav. Inf. Technol..

[23]  Geert Loosveldt,et al.  Face-to-Face versus Web Surveying in a High-Internet-Coverage Population Differences in Response Quality , 2008 .

[24]  Youngme Moon,et al.  Don’t Blame the Computer: When Self-Disclosure Moderates the Self-Serving Bias , 2003 .

[25]  James R. Lewis,et al.  Psychometric evaluation of an after-scenario questionnaire for computer usability studies: the ASQ , 1991, SGCH.

[26]  Jakob Nielsen,et al.  Estimating the number of subjects needed for a thinking aloud test , 1994, Int. J. Hum. Comput. Stud..

[27]  M. Galesic,et al.  Effects of Questionnaire Length on Participation and Indicators of Response Quality in a Web Survey , 2009 .

[28]  M.D.T. de Jong,et al.  Constructive Interaction: An Analysis of Verbal Interaction in a Usability Setting , 2006, IEEE Transactions on Professional Communication.

[29]  Gino Verleye,et al.  User-centered E-Government in practice: A comprehensive model for measuring user satisfaction , 2009, Gov. Inf. Q..

[30]  Fred Paas,et al.  Uncovering cognitive processes: Different techniques that can contribute to cognitive load research and instruction , 2009, Comput. Hum. Behav..

[31]  Anders Bruun,et al.  Let your users do the testing: a comparison of three remote asynchronous usability testing methods , 2009, CHI.

[32]  Jakob Nielsen,et al.  Heuristic Evaluation of Prototypes (individual) , 2022 .

[33]  Elizabeth D. Murphy,et al.  Think-aloud protocols: a comparison of three think-aloud protocols for use in testing data-dissemination web sites for usability , 2010, CHI.

[34]  Deborah Hix,et al.  Remote usability evaluation: can users report their own critical incidents? , 1998, CHI Conference Summary.

[35]  Janet Finlay,et al.  Getting a measure of satisfaction from eyetracking in practice , 2006, CHI EA '06.

[36]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[37]  J. Edward Russo,et al.  A software system for the collection of retrospective protocols prompted by eye fixations , 1979 .

[38]  Jared M. Spool,et al.  Testing web sites: five users is nowhere near enough , 2001, CHI Extended Abstracts.

[39]  Joseph H. Goldberg,et al.  Computer interface evaluation using eye movements: methods and constructs , 1999 .

[40]  L. van Waes,et al.  Thinking aloud as a method for testing the usability of Websites: the influence of task variation on the evaluation of hypertext , 2000 .

[41]  Leo Lentz,et al.  Reading aloud and the delay of feedback: Explanations for the effectiveness of reader protocols , 2007 .

[42]  David M. Nichols,et al.  Participatory usability: supporting proactive users , 2003, CHINZ '03.

[43]  Linden J. Ball,et al.  Cueing retrospective verbal reports in usability testing through eye-movement replay , 2007, BCS HCI.

[44]  Menno D. T. de Jong,et al.  Retrospective vs. concurrent think-aloud protocols: Testing the usability of an online library catalogue , 2003, Behav. Inf. Technol..

[45]  Kasper Hornbæk,et al.  What do usability evaluators do in practice?: an explorative study of think-aloud testing , 2006, DIS '06.

[46]  William P. Eveland,et al.  Examining Information Processing on the World Wide Web Using Think Aloud Protocols , 2000 .

[47]  Matthijs Sienot Pretesting Web Sites , 1997 .

[48]  Amela Karahasanovic,et al.  Comparing of feedback-collection and think-aloud methods in program comprehension studies , 2009, Behav. Inf. Technol..

[49]  Ardion Beldad,et al.  How shall I trust the faceless and the intangible? A literature review on the antecedents of online trust , 2010, Comput. Hum. Behav..

[50]  Stephanie Wilson,et al.  Identifying web usability problems from eye-tracking data , 2007 .

[51]  John Carlo Bertot,et al.  The E-Government paradox: Better customer service doesn't necessarily cost less , 2008, Gov. Inf. Q..

[52]  Leo Lentz,et al.  Focus: Design and Evaluation of a Software Tool for Collecting Reader Feedback , 2001 .

[53]  Peter Jan Schellens,et al.  Toward a document evaluation methodology: what does research tell us about the validity and reliability of evaluation methods? , 2000 .

[54]  Harry Budi Santoso,et al.  Measuring the user experience , 2008 .

[55]  J R Lewis,et al.  Sample Sizes for Usability Studies: Additional Considerations , 1994, Human factors.

[56]  Leo Lentz,et al.  Retrospective think-aloud method: using eye movements as an extra cue for participants' verbalizations , 2011, CHI.

[57]  L. Faulkner Beyond the five-user assumption: Benefits of increased sample sizes in usability testing , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[58]  Zheng Yan,et al.  Factors affecting response rates of the web survey: A systematic review , 2010, Comput. Hum. Behav..

[59]  Marc Hassenzahl Prioritizing usability problems: Data-driven and judgement-driven severity estimates , 2000, Behav. Inf. Technol..

[60]  Paul van Schaik,et al.  Five Psychometric Scales for Online Measurement of the Quality of Human-Computer Interaction in Web Sites , 2005, Int. J. Hum. Comput. Interact..

[61]  Victoria A. Bowers Concurrent versus Retrospective Verbal Protocol for Comparing Window Usability , 1990 .

[62]  Menno D.T. de Jong,et al.  Heuristic Web site Evaluation: Exploring the Effects of Guidelines on Experts'Detection of Usability Problems , 2008 .

[63]  Anna L. Cox,et al.  The Role of Mouse Movements in Interactive Search , 2006 .

[64]  K. Jöreskog Statistical analysis of sets of congeneric tests , 1971 .

[65]  Linden J. Ball,et al.  In Search of Salience: A Response-time and Eye-movement Analysis of Bookmark Recognition , 2004, BCS HCI.

[66]  Alexander van Deursen,et al.  Improving digital skills for the use of online public information and services , 2009, Gov. Inf. Q..

[67]  Klaus Kaasgaard,et al.  Comparative usability evaluation , 2004, Behav. Inf. Technol..

[68]  Robert J. K. Jacob,et al.  Eye tracking in human-computer interaction and usability research : Ready to deliver the promises , 2002 .

[69]  W. J. Lyddon Cognitive Therapy for Personality Disorders: A Schema-Focused Approach , 1992, Journal of Cognitive Psychotherapy.

[70]  de Jong,et al.  Reader feedback in text design. Validity of the plus-minus method for the pretesting of public information brochures. , 1998 .

[71]  Carel Jansen,et al.  How do people use instruction guides? Confirming and disconfirming patterns of use , 2002 .

[72]  Stephanie Wilson,et al.  Identifying web usability problems from eye-tracking data , 2007, BCS HCI.

[73]  Robert A. Virzi,et al.  Refining the Test Phase of Usability Evaluation: How Many Subjects Is Enough? , 1992 .

[74]  M. Just,et al.  Eye fixations and cognitive processes , 1976, Cognitive Psychology.

[75]  Jakob Nielsen,et al.  Severity Ratings for Usability Problems , 2006 .

[76]  Gitte Lindgaard,et al.  Usability testing: what have we overlooked? , 2007, CHI.

[77]  Roger Tourangeau,et al.  Cognitive Aspects of Survey Measurement and Mismeasurement , 2003 .

[78]  Karen A. Schriver Dynamics in document design , 1998 .

[79]  Klaus Opwis,et al.  Usable error message presentation in the World Wide Web: Do not show errors right away , 2007, Interact. Comput..

[80]  Philip T. Kortum,et al.  Determining what individual SUS scores mean: adding an adjective rating scale , 2009 .

[81]  Mark A. Neerincx,et al.  Support concepts for Web navigation: a cognitive engineering approach , 2001, WWW '01.

[82]  Peter Jan Schellens,et al.  Evaluation of an Informational Web Site: Three Variants of the Think-aloud Method Compared , 2007 .

[83]  Leo Lentz,et al.  Usable guidelines for usable websites? An analysis of five e-government heuristics , 2010, Gov. Inf. Q..

[84]  Aulikki Hyrskykari,et al.  Gaze Path Stimulation in Retrospective Think-Aloud , 2008 .

[85]  Peter V. Miller,et al.  Web Survey Methods Introduction , 2008 .

[86]  Lynne Cooke,et al.  Assessing Concurrent Think-Aloud Protocol as a Usability Test Method: A Technical Communication Approach , 2010, IEEE Transactions on Professional Communication.

[87]  Menno D. T. de Jong,et al.  Employing think-aloud protocols and constructive interaction to test the usability of online library catalogues: a methodological comparison , 2004, Interact. Comput..

[88]  Clayton Lewis,et al.  Learning to use a text processing system: Evidence from “thinking aloud” protocols , 1982, CHI '82.

[89]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[90]  Paul van Schaik,et al.  Modelling user experience with web sites: Usability, hedonic value, beauty and goodness , 2008, Interact. Comput..

[91]  D. D. Davis,et al.  Cognitive therapy of personality disorders. , 1990, Progress in behavior modification.

[92]  Leo Lentz,et al.  Website Evaluation Questionnaire: Development of a Research-Based Tool for Evaluating Informational Websites , 2007, EGOV.

[93]  Mick McGee Master usability scaling: magnitude estimation and master scaling applied to usability measurement , 2004, CHI '04.

[94]  Kasper Hornbæk,et al.  Tracing impact in a usability improvement process , 2008, Interact. Comput..

[95]  Huub van den Bergh,et al.  Measuring the quality of governmental websites in a controlled versus an online setting with the 'Website Evaluation Questionnaire' , 2012, Gov. Inf. Q..

[96]  Zhiwei Guan,et al.  The validity of the stimulated retrospective think-aloud method as measured by eye tracking , 2006, CHI.

[97]  Antti Oulasvirta,et al.  Too good to be bad: Favorable product expectations boost subjective usability ratings , 2011, Interact. Comput..

[98]  Karen A. Schriver Dynamics in Document Design: Creating Text for Readers , 1996 .

[99]  Ted Boren,et al.  Thinking aloud: reconciling theory and practice , 2000 .

[100]  Alexander Serenko,et al.  Are interface agents scapegoats? Attributions of responsibility in human-agent interaction , 2007, Interact. Comput..

[101]  Menno D. T. de Jong,et al.  Does think aloud work?: how do we know? , 2006, CHI Extended Abstracts.

[102]  Leo Lentz,et al.  Scenario evaluation of municipal Web sites: Development and use of an expert-focused evaluation tool , 2006, Gov. Inf. Q..

[103]  F. Paas,et al.  Uncovering the problem-solving process: cued retrospective reporting versus concurrent and retrospective reporting. , 2005, Journal of experimental psychology. Applied.

[104]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[105]  Timothy D. Wilson,et al.  Telling more than we can know: Verbal reports on mental processes. , 1977 .

[106]  Jeff Sauro,et al.  The Factor Structure of the System Usability Scale , 2009, HCI.

[107]  Leo Lentz,et al.  Combining Concurrent Think-Aloud Protocols and Eye-Tracking Observations: An Analysis of Verbalizations and Silences , 2012, IEEE Transactions on Professional Communication.

[108]  Willem-Paul Brinkman,et al.  The theoretical foundation and validity of a component-based usability questionnaire , 2009, Behav. Inf. Technol..

[109]  Steve Muylle,et al.  The conceptualization and empirical validation of web site user satisfaction , 2004, Inf. Manag..

[110]  Terry C. Lansdown,et al.  The mind's eye: cognitive and applied aspects of eye movement research , 2005 .

[111]  M. Couper A REVIEW OF ISSUES AND APPROACHES , 2000 .

[112]  Alexander van Deursen,et al.  E-Services for Citizens: The Dutch Usage Case , 2007, EGOV.

[113]  K. A. Ericsson,et al.  Protocol analysis: Verbal reports as data, Rev. ed. , 1993 .

[114]  Morten Hertzum,et al.  Scrutinising usability evaluation: does thinking aloud affect behaviour and mental workload? , 2009, Behav. Inf. Technol..

[115]  C. Fornell,et al.  Foundations of the American Customer Satisfaction Index , 2000 .

[116]  J.H. Spyridakis,et al.  Internet-based research: providing a foundation for web-design guidelines , 2005, IEEE Transactions on Professional Communication.

[117]  A F Kramer,et al.  Task coordination and aging: explorations of executive control processes in the task switching paradigm. , 1999, Acta psychologica.

[118]  Joseph H. Goldberg,et al.  Chapter 23 – Eye Tracking in Usability Evaluation: A Practitioner's Guide , 2003 .

[119]  P. Schellens,et al.  Readers' Background Characteristics and Their Feedback on Documents: The Influence of Gender and Educational Level on Evaluation Results , 2001 .

[120]  Alexandros Xenakis,et al.  An evaluation framework for e-participation in parliaments , 2010 .

[121]  Thomas S. Tullis,et al.  A Comparison of Methods for Eliciting Post-Task Subjective Ratings in Usability Testing , 2006 .

[122]  Jakob Nielsen,et al.  Usability inspection methods , 1994, CHI 95 Conference Companion.

[123]  Thomas S. Tullis,et al.  A Comparison of Questionnaires for Assessing Website Usability , 2004 .

[124]  L. Cooke,et al.  Is Eye Tracking the Next Step in Usability Testing? , 2006, 2006 IEEE International Professional Communication Conference.

[125]  L. Cooke,et al.  Using eye tracking to address limitations in think-aloud protocol , 2005, IPCC 2005. Proceedings. International Professional Communication Conference, 2005..

[126]  Linden J. Ball,et al.  An Eye Movement Analysis of Web Page Usability , 2002 .

[127]  F.R.H. Zijlstra,et al.  Efficiency in work behaviour: A design approach for modern tools , 1993 .

[128]  J. P. Hansen The use of eye mark recordings to support verbal retrospection in software testing , 1991 .

[129]  Leo Lentz,et al.  The evaluation of text quality: expert-focused and reader-focused methods compared , 1997 .

[130]  Menno D.T. de Jong,et al.  How do experts assess usability problems? An empirical analysis of cognitive shortcuts , 2009 .

[131]  Karen A. Schriver Evaluating Text Quality: The Continuum from Text-Focused to Reader-Focused Methods. Technical Report No. 41. , 1989 .