Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces

The goal of this study was to perform an evaluation of a set of voice interaction models (supported by a hands-free solution activated by a wake-up word, a mobile app and a TV remote control with microphone) to identify the most appropriate solution for interactive television. The research addressed issues associated with natural language systems such as usability, interaction and privacy perception, and aimed to analyze the strengths and limitations of the voice interaction models. On a first evaluation approach, a prototype based on a Wizard-of-Oz methodology was used, while a second approach was based on a functional prototype. The preferred interaction model was the hands-free solution activated by a wake-up word because it was easy to use and raised the least difficulties in any task execution. Despite this result, the other two models are not disregarded for a future voice interaction system in television. The TV remote control was the most natural way of interaction for the study’s participants. The need for control provided by the remote and by the app makes the participants feel like these grant more privacy. Participants considered that a voice-operated system for TV would be very useful and almost all were receptive to having such a system at home. Lastly, based on commercial standards and guidelines, solutions to issues identified by participants in the visual interface of the TV system were proposed and considered for the next phase of prototype development, also benefiting other researches in the field.

[1]  Howard A. Elder,et al.  On the feasibility of voice input to an on-line computer processing system , 1970, CACM.

[2]  William Lidwell,et al.  Universal Principles of Design , 2003 .

[3]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[4]  Philip T. Kortum,et al.  Determining what individual SUS scores mean: adding an adjective rating scale , 2009 .

[5]  Silvia Fernandes,et al.  Voice interaction on TV: analysis of natural language interaction models and recommendations for voice user interfaces , 2018, Multimedia Tools and Applications.

[6]  David G. Novick,et al.  Some Usability Issues and Research Priorities in Spoken Dialog Applications , 2005 .

[7]  R. Bernhaupt,et al.  Using speech to search: comparing built-in and ambient speech search in terms of privacy and user experience (regular paper) , 2017 .

[8]  Michael Schaefer,et al.  Effects of different viewing perspectives on somatosensory activations during observation of touch , 2009, Human brain mapping.

[9]  Markku Turunen,et al.  User expectations and user experience with different modalities in a mobile phone controlled home entertainment system , 2009, Mobile HCI.

[10]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[11]  Gina-Anne Levow,et al.  Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[12]  Georgios Kouroupetroglou,et al.  Spoken Dialogue Interfaces: Integrating Usability , 2009, USAB.

[13]  Regina Bernhaupt,et al.  A set of recommendations for the control of IPTV-systems via smart phones based on the understanding of users practices and needs , 2012, EuroITV.

[14]  Masaru Miyazaki,et al.  A Spoken Dialogue Interface for TV Operations Based on Data Collected by Using WOZ Method , 2004, IEICE Trans. Inf. Syst..