Towards Conversational Speech Recognition for a Wearable Computer Based Appointment Scheduling Agent

We present an original study of current mobile appointment scheduling devices. Our intention is to create a conversational wearable computing interface for the task of appointment scheduling. We employ both survey questionnaires and timing tests of mock scheduling tasks. The study includes over 150 participants and times each person using his or her own scheduling device (e.g., a paper planner or personal digital assistant). Our tests show that current scheduling devices take a surprisingly long time to access and that our subjects often do not use the primary scheduling device claimed on the questionnaire. Slower devices (e.g., PDAs) are disproportionately abandoned in favor of devices with faster access times (e.g., scrap paper). Many subjects indicate that they use a faster device when mobile as a buffer until they can reconcile the data with their primary scheduling device. The findings of this study motivated the design of two conversational speech systems for everyday–use wearable computers. The Calendar Navigator Agent provides extremely fast access to the user’s calendar through a wearable computer with a head-up display. The user’s verbal negotiation for a meeting time is monitored by the wearable which provides an appropriate calendar display based on the current conversation. The second system, now under development, attempts to minimize cognitive load by buffering and indexing appointment conversations for later processing by the user. Both systems use extreme restrictions to decrease speech recognition error rates, yet are designed to be socially graceful.

[1]  Gina-Anne Levow,et al.  Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[2]  A. Dale,et al.  Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. , 1998, Science.

[3]  Ben Shneiderman,et al.  The limits of speech recognition , 2000, CACM.

[4]  P R Cohen,et al.  The role of voice input for human-machine communication. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[5]  John D. Gould,et al.  Composing letters with a simulated listening typewriter , 1982, CHI '82.

[6]  H W Upton,et al.  Wearable eyeglass speechreading aid. , 1968, American annals of the deaf.

[7]  Lawrence J. Najjar,et al.  Wearable computers for performance support: initial feasibility study , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[8]  R. Marsh,et al.  Event-based prospective memory and executive control of working memory. , 1998, Journal of experimental psychology. Learning, memory, and cognition.

[9]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[10]  Neff Walker,et al.  A comparison of selection time from walking and pull-down menus , 1990, CHI '90.

[11]  David A. Ross,et al.  Wearable interfaces for orientation and wayfinding , 2000, Assets '00.

[12]  Daniel B. Horn,et al.  Patterns of entry and correction in large vocabulary continuous speech recognition systems , 1999, CHI '99.

[13]  R. Kawashima,et al.  Participation of the prefrontal cortices in prospective memory: evidence from a PET study in humans , 1998, Neuroscience Letters.

[14]  Aaron E. Rosenberg,et al.  SCANMail: a voicemail interface that makes speech browsable, readable and searchable , 2002, CHI.

[15]  Thad Starner,et al.  Privacy, Wearable Computers, And Recording Technology , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[16]  F. Craik,et al.  The effects of divided attention on encoding and retrieval processes in human memory. , 1996, Journal of experimental psychology. General.

[17]  Barry Arons,et al.  A Conversational Telephone Messaging System , 1984, IEEE Transactions on Consumer Electronics.

[18]  V. Rich Personal communication , 1989, Nature.

[19]  John Makhoul,et al.  Comparative experiments on large vocabulary speech recognition , 1993 .

[20]  Pattie Maes,et al.  Just-in-time information retrieval , 2000 .

[21]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[22]  Catalina Danis,et al.  Storywriter: a speech oriented editor , 1994, CHI '94.

[23]  Laird S. Cermak,et al.  The Effects of Divided Attention During Encoding and Retrieval on Amnesic Patients' Memory Performance , 1999, Cortex.

[24]  Noa M. Rensing,et al.  Eyeglass-based systems for wearable computing , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[25]  Lawrence J. Najjar,et al.  A wearable computer for quality assurance inspectors in a food processing plant , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[26]  Alex Pentland,et al.  Wearable Audio Computing: A Survey of Interaction Techniques , 2000 .

[27]  Thad Starner,et al.  Remembrance Agent: A Continuously Running Automated Information Retrieval System , 1996, PAAM.

[28]  Daniel P. Siewiorek,et al.  The CMU mobile computers: a new generation of computer systems , 1994, Proceedings of COMPCON '94.

[29]  Mark Krichever,et al.  Development of a commercially successful wearable data collection system , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[30]  Richard S. J. Frackowiak,et al.  Brain regions associated with acquisition and retrieval of verbal episodic memory , 1994, Nature.

[31]  Bill N. Schilit,et al.  Dynomite: a dynamically organized ink and audio notebook , 1998 .

[32]  W. Wagenaar My memory: A study of autobiographical memory over six years , 1986, Cognitive Psychology.

[33]  Barry Arons,et al.  VoiceNotes: a speech interface for a hand-held voice notetaker , 1993, INTERCHI.

[34]  Ben Shneiderman,et al.  Getting real about speech: overdue or overhyped? , 2002, CHI Extended Abstracts.

[35]  Kent Lyons,et al.  Mobile capture for wearable computer usability testing , 2001, Proceedings Fifth International Symposium on Wearable Computers.

[36]  Chris Schmandt Voice communication with computers , 1993 .

[37]  Ben Shneiderman,et al.  Speech versus Mouse Commands for Word Processing: An Empirical Evaluation , 1993, Int. J. Man Mach. Stud..

[38]  Alexander I. Rudnicky Mode preference in a simple data-retrieval task , 1993, HLT.

[39]  Jason P. Mitchell,et al.  The Seven Sins of Memory , 2003, Annals of the New York Academy of Sciences.

[40]  Raja Parasuraman,et al.  Varieties of attention , 1984 .

[41]  Gale Martin,et al.  The Utility of Speech Input in User-Computer Interfaces , 1989, Int. J. Man Mach. Stud..

[42]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[43]  Alex Pentland,et al.  Wearable computing and contextual awareness , 1999 .