Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?

Chatbots are becoming increasingly popular as a human-computer interface. The traditional best practices normally applied to User Experience (UX) design cannot easily be applied to chatbots, nor can conventional usability testing techniques guarantee accuracy. WeightMentor is a bespoke self-help motivational tool for weight loss maintenance. This study addresses the following four research questions: How usable is the WeightMentor chatbot, according to conventional usability methods?; To what extend will different conventional usability questionnaires correlate when evaluating chatbot usability?; And how do they correlate to a tailored chatbot usability survey score?; What is the optimum number of users required to identify chatbot usability issues?; How many task repetitions are required for a first-time chatbot users to reach optimum task performance (i.e. efficiency based on task completion times)? This paper describes the procedure for testing the WeightMentor chatbot, assesses correlation between typical usability testing metrics, and suggests that conventional wisdom on participant numbers for identifying usability issues may not apply to chatbots. The study design was a usability study. WeightMentor was tested using a pre-determined usability testing protocol, evaluating ease of task completion, unique usability errors and participant opinions on the chatbot (collected using usability questionnaires). WeightMentor usability scores were generally high, and correlation between questionnaires was strong. The optimum number of users for identifying chatbot usability errors was 26, which challenges previous research. Chatbot users reached optimum proficiency in tasks after just one repetition. Usability test outcomes confirm what is already known about chatbots - that they are highly usable (due to their simple interface and conversation-driven functionality) but conventional methods for assessing usability and user experience may not be as accurate when applied to chatbots.

[1]  F. Reichheld The one number you need to grow. , 2003, Harvard business review.

[2]  David Cameron,et al.  Towards a chatbot for digital counselling , 2017, BCS HCI.

[3]  Asbjørn Følstad,et al.  Chatbots: changing user needs and motivations , 2018, Interactions.

[4]  B. Schneirdeman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[5]  Philip T. Kortum,et al.  Determining what individual SUS scores mean: adding an adjective rating scale , 2009 .

[6]  Maurice Mulvenna,et al.  Back to the Future: Lessons from Knowledge Engineering Methodologies for Chatbot Design and Development , 2018 .

[7]  E. L. Donaldson,et al.  A text message based weight management intervention for overweight adults. , 2014, Journal of human nutrition and dietetics : the official journal of the British Dietetic Association.

[8]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction, 6th Edition , 2016 .

[9]  Elizabeth G Eakin,et al.  Evaluating the Maintenance of Lifestyle Changes in a Randomized Controlled Trial of the ‘Get Healthy, Stay Healthy’ Program , 2016, JMIR mHealth and uHealth.

[10]  Jakob Nielsen,et al.  Heuristics for User Interface Design , 2006 .

[11]  James T. Miller,et al.  An Empirical Evaluation of the System Usability Scale , 2008, Int. J. Hum. Comput. Interact..

[12]  Huiru Zheng,et al.  A New Automated Chatbot for Weight Loss Maintenance , 2018 .

[13]  Enrico Coiera,et al.  Measuring User Experience in Conversational Interfaces: A Comparison of Six Questionnaires , 2018 .

[14]  Martin Schrepp,et al.  Construction and Evaluation of a User Experience Questionnaire , 2008, USAB.

[15]  Michael F. McTear,et al.  Best Practices for Designing Chatbots in Mental Healthcare – A Case Study on iHelpr , 2018 .

[16]  Jakob Nielsen,et al.  A mathematical model of the finding of usability problems , 1993, INTERCHI.

[17]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[18]  Guang-Jie Ren,et al.  Conversational UX Design , 2017, CHI Extended Abstracts.

[19]  Silvia Gabrielli,et al.  Addressing challenges in promoting healthy lifestyles: the al-chatbot approach , 2017, PervasiveHealth.

[20]  Kevin Curran,et al.  Assessing usability testing for people living with dementia , 2016, REHAB '16.

[21]  A. Barak,et al.  Defining Internet-Supported Therapeutic Interventions , 2009, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.