Usability of dialogue design strategies for automated surname capture

Surname capture via automatic speech recognition over the telephone has many commercial applications, including automated directory assistance and travel reservation services. This paper presents a usability evaluation of three different dialogue designs for automated surname capture, within the context of a flight reservation service. The three designs explored were: a Speak Only strategy, in which callers simply say the surname; a One Stage Speak and Spell strategy in which callers speak and spell the surname in a single utterance; and a Two Stage Speak and Spell strategy in which callers speak and spell the surname in two separate dialogue stages. The methodology employed in the research provides both quantitative user attitude data and performance results for each of the strategies, based on an empirical study with a cohort of 95 participants. The results show a clear distinction between strategies. User attitude towards the dialogues that involve both speaking and spelling the name is high. User attitude towards the Speak Only strategy is significantly less positive. Task completion rates are also significantly higher in the two strategies that involve spelling the name, at around 80% compared to just over 50% in the Speak Only strategy. The data underline the importance of user testing, demonstrating the value of the evaluation methodology used, and provide encouraging results for the strategies that involve both speaking and spelling the name.

[1]  Jean Monné,et al.  Application of the n-best solutions algorithm to speaker-independent spelling recognition over the telephone , 1993, EUROSPEECH.

[2]  Michael Meyer,et al.  Recognition of spoken and spelled proper names , 1997, EUROSPEECH.

[3]  Gitta P. M. Laan The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style , 1997, Speech Commun..

[4]  Harriet J. Nock,et al.  Pronunciation modeling by sharing gaussian densities across phonetic models , 1999, EUROSPEECH.

[5]  Georg Fries,et al.  Faust - a directory assistance demonstrator , 1995, EUROSPEECH.

[6]  James F. Allen,et al.  Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[7]  Gérard Chollet,et al.  Directory name retrieval over the telephone in the Picasso project , 1998, Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376).

[8]  Jean Monné,et al.  Speaker-independent spelling recognition over the telephone , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Javier Macías Guarasa,et al.  An interactive directory assistance service for Spanish with large-vocabulary recognition , 2001, INTERSPEECH.

[10]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[11]  Shmuel Safra,et al.  IDAS : Interactive Directory Assistance Service , 2000 .

[12]  Matthew Lennig,et al.  Directory assistance automation in Bell Canada: Trial results , 1995, Speech Commun..

[13]  Jean Monné,et al.  Recognition of spelled names over the telephone and rejection of data out of the spelling lexicon , 1999, EUROSPEECH.

[14]  Andreas Kellner,et al.  Towards an automated directory information system , 1997, EUROSPEECH.

[15]  Rubén San-Segundo-Hernández,et al.  Spanish recognizer of continuously spelled names over the telephone , 2000, Speech Commun..

[16]  Josef G. Bauer,et al.  Accurate recognition of city names with spelling as a fall back strategy , 1999, EUROSPEECH.

[17]  F. Bechet,et al.  Very large vocabulary proper name recognition for directory assistance , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[18]  Hauke Schramm,et al.  Strategies for name recognition in automatic directory assistance systems , 2000, Speech Commun..

[19]  Ariadna Font Llitjós,et al.  Knowledge of language origin improves pronunciation accuracy of proper names , 2001, INTERSPEECH.

[20]  Bhuvana Ramabhadran,et al.  Innovative approaches for large vocabulary name recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[21]  Maxine Eskénazi,et al.  Trends in speaking styles research , 1993, EUROSPEECH.

[22]  Anand R. Setlur,et al.  Improved spelling recognition using a tree-based fast lexical match , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[23]  Lori Lamel,et al.  The LIMSI ARISE system , 2000, Speech Commun..

[24]  Candace A. Kamm,et al.  Speech recognition issues for directory assistance applications , 1995, Speech Commun..

[25]  Kate Hunicke-Smith,et al.  Effect of Speaking Style on LVCSR Performance , 1996 .

[26]  Mike Edgington,et al.  User attitudes to concatenated natural speech and text-to-speech synthesis in an automated information service , 1999, EUROSPEECH.

[27]  Alexander H. Waibel,et al.  Recognition of spelled names over the telephone , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[28]  Steven John Whittaker,et al.  Issues in large-vocabulary interactive speech systems , 1996 .

[29]  Stephanie Seneff,et al.  Automatic Acquisition of Names Using Speak and Spell Mode in Spoken Dialogue Systems , 2003, NAACL.

[30]  Stephanie Seneff,et al.  Integrating speech with keypad input for automatic entry of spelling and pronunciation of new words , 2002, INTERSPEECH.

[31]  Fred Stentiford,et al.  Identifying usability attributes of automated telephone services , 1993, EUROSPEECH.