Towards a tool for predicting speech functionality

Abstract In these days of multimodal systems and interfaces, many research teams are investigating the purposes for which novel combinations of modalities can be used. It is easy to forget that we still lack solid foundations for evaluating the functionality of individual families of input–output modalities, such as the speech modalities. The reason why these foundations are missing is the complexity of the problem. Based on the study of particular applications, empirical investigations of speech functionality address points in a vast multi-dimensional design space. At best, solid findings yield low-level generalisations which can be used by designers developing almost identical applications. Furthermore, the conceptual and theoretical apparatus needed to describe these findings in a principled way is largely missing. This paper argues that a shift in perspective can help address issues of modality choice both scientifically and in design practice. Instead of empirically focusing on fragments of the virtually infinite combinatorics of tasks, environments, performance parameters, user groups, cognitive properties, etc., the problem of modality functionality is addressed as a problem of choosing between modalities which have very different properties with respect to the representation and exchange of information between user and system. Based on a study of 120 claims on speech functionality from the literature, it is shown that a small set of modality properties are surprisingly powerful in justifying, supporting and correcting the claims set. The paper analyses why modality properties can be used for these purposes and argues that their power could be made available to systems and interface designers who have to make modality choices during early design of speech-related systems and interfaces. Using hypertext, it is illustrated how this power may be harnessed for the purpose of predictively supporting speech modality choice during early systems and interface design.

[1]  Neville A. Stanton,et al.  Can speech be used for alarm displays in ‘process control’ type tasks? , 1992 .

[2]  Jock D. Mackinlay,et al.  A Semantic Analysis of the Design Space of Input Devices , 1990, Hum. Comput. Interact..

[3]  P. Beckett,et al.  Increased Aircraft Survivability Using Direct Voice Input , 1983 .

[4]  van Fl Floris Nes Multimedia workstations for the office , 1988 .

[5]  Jan Noyes,et al.  A review of speech recognition applications in the office , 1989 .

[6]  P. Palanque,et al.  Design, Specification and Verification of Interactive Systems ’95 , 2000, Eurographics.

[7]  Niels Ole Bernsen,et al.  Foundations of Multimodal Representations: A Taxonomy of Representational Modalities , 1994, Interact. Comput..

[8]  Jock D. Mackinlay,et al.  A morphological analysis of the design space of input devices , 1991, TOIS.

[9]  Niels-ole Bernsen,et al.  A Reference Model for Output Information in Intelligent Multimedia Presentation Systems , 1996 .

[10]  Kevin Hapeshi,et al.  Design guidelines for using speech in interactive multimedia systems , 1993 .

[11]  Jan Noyes,et al.  Interactive speech technology , 1993 .

[12]  Robert I. Damper,et al.  Speech as an interface medium: how can it best be used? , 1993 .

[13]  Alan F. Newell,et al.  Listening typewriters in use: some practical studies , 1993 .

[14]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[15]  D. F. Gordon Voice recognition and systems activation for aircrew and weapon system interaction , 1990, IEEE Conference on Aerospace and Electronics.

[16]  David Usher Automatic speech recognition and mobile radio , 1993 .

[17]  John L. Sibert,et al.  Toto: a tool for selecting interaction techniques , 1990, UIST '90.

[18]  J. Searle Expression and Meaning: A taxonomy of illocutionary acts , 1975 .

[19]  Niels Ole Bernsen Why are Analogue Graphics and Natural Language both Needed in HCI? , 1994, DSV-IS.

[20]  Neville Stanton Speech-based alarm displays , 1993 .

[21]  A. F. Cresswell Starr Is control by voice the right answer for the avionics environment , 1993 .

[22]  Martin G. Helander,et al.  Systems Design for Automated Speech Recognition , 1988 .

[23]  John D. Gould,et al.  Composing letters with a simulated listening typewriter , 1982, CHI '82.

[24]  Gale Martin,et al.  The Utility of Speech Input in User-Computer Interfaces , 1989, Int. J. Man Mach. Stud..

[25]  Dylan Marc Jones,et al.  Voice as a medium for document annotation , 1993 .

[26]  Jan Noyes Speech technology in the future , 1993 .

[27]  Eric Lewis Interactive speech in computer-aided learning , 1993 .

[28]  E Lewis,et al.  Interactive Speech Technology , 1993 .

[29]  Robert T. Nicholson,et al.  Usage patterns in an integrated voice and data communications system , 1985, TOIS.

[30]  C. Baber Speech output , 1993 .

[31]  Niels Ole Bernsen,et al.  A Software Demonstrator of Modality Theory , 1995, DSV-IS.