Who, Why and How Often? Key Elements for the Design of a Successful Speech Application Taking Account of the Target Groups

Three questions have to be answered before designing a speech application: who will use it, why will they use it and how often will they use it? A designer needs answers to all of these questions to best be able to address the needs of the target group. This chapter will outline a methodical procedural model which describes the workflow required to build a speech application that is properly designed for its target groups. The workflow covers the analysis of requirements, specification, implementation, production, delivery and operation. This chapter also provides an overview of the most important information we need to describe a voice user interface, and where this information can be found. It also provides an overview of current and future technical developments in the field of speech processing and their relevance for the design of dialogues in future. We will then recommend 11 design features which, according to our experience, help the designer of a voice user interface to exploit knowledge about the user and to focus the design of the dialogue on the user’s abilities, their competence, expectations and needs.

[1]  M. Gilly,et al.  The Elderly Consumer and Adoption of Technologies , 1985 .

[2]  Winslow Burleson,et al.  Detecting anger in automated voice portal dialogs , 2006, INTERSPEECH.

[3]  Edward K. Strong,et al.  Vocational interests of men and women , 1954 .

[4]  Nicole C. Krämer,et al.  Schnittstelle für alle? Möglichkeiten zur Anpassung anthropomorpher Interface Agenten an verschiedene Nutzergruppen , 2002, MuC.

[5]  Katherine Canada,et al.  The technological gender gap: Evidence and recommendations for educators and computer-based instruction designers , 1991 .

[6]  Jeremy H. Wright,et al.  Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System , 2011, J. Artif. Intell. Res..

[7]  Karin M. Eichhoff-Cyrus Adam, Eva und die Sprache : Beiträge zur Geschlechterforschung , 2004 .

[8]  P. B. Baltes,et al.  Entwicklungspsychologie der Lebensspanne: theoretische Leitsätze , 1990 .

[9]  Lee Sproull,et al.  When the Interface Is a Face , 1996, Hum. Comput. Interact..

[10]  Nigel Gilbert,et al.  Simulating speech systems , 1991 .

[11]  Louis M. Gomez,et al.  Learning to Use a Text Editor: Some Learner Characteristics that Predict Success , 1987, SGCH.

[12]  Lee Sproull,et al.  Encountering an Alien Culture , 1984 .

[13]  Ronald K. Hambleton,et al.  Encyclopedia of psychological assessment , 2002 .

[14]  J. Cassell,et al.  Social Dialongue with Embodied Conversational Agents , 2005 .

[15]  T. Levin,et al.  Effect of Gender and Computer Experience on Attitudes toward Computers , 1989 .

[16]  Astrid Paeschke,et al.  Articulatory reduction in emotional speech , 1999, EUROSPEECH.

[17]  Andrea Paoloni,et al.  Subjective age estimation of telephonic voices , 2000, Speech Commun..

[18]  Florian Metze,et al.  Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[19]  Helen McBreen Embodied Conversational Agents in E-Commerce Applications , 2002 .

[20]  A. Mulac Perceptions of women and men based on their linguistic behavior: The Gender-Linked Language Effect , 1999 .

[21]  Fabio Paternò,et al.  ConcurTaskTrees: A Diagrammatic Notation for Specifying Task Models , 1997, INTERACT.

[22]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[23]  Richard Catrambone,et al.  ECA as User Interface Paradigm , 2004, From Brows to Trust.

[24]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[25]  J. V. Kuppevelt,et al.  Advances in natural multimodal dialogue systems , 2005 .

[26]  Jean-Claude Martin,et al.  Evaluation of Multimodal Behaviour of Embodied Agents , 2004, From Brows to Trust.

[27]  F. Burkhardt,et al.  An Emotion-Aware Voice Portal , 2005 .

[28]  David G. Stork,et al.  Pattern Classification , 1973 .

[29]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.