The role of voice input for human-machine communication.

Optimism is growing that the near future will witness rapid growth in human-computer interaction using voice. System prototypes have recently been built that demonstrate speaker-independent real-time speech recognition, and understanding of naturally spoken utterances with vocabularies of 1000 to 2000 words, and larger. Already, computer manufacturers are building speech recognition subsystems into their new product lines. However, before this technology can be broadly useful, a substantial knowledge base is needed about human spoken language and performance during computer-based spoken interaction. This paper reviews application areas in which spoken interaction can play a significant role, assesses potential benefits of spoken interaction with machines, and compares voice with other modalities of human-computer interaction. It also discusses information that will be needed to build a firm empirical foundation for the design of future spoken and multimodal interfaces. Finally, it argues for a more systematic and scientific approach to investigating spoken input and performance with future language technology.

[1]  G. D. Weeks,et al.  Studies in Interactive Communication: II. The Effects of Four Communication Modes on the Linguistic Performance of Teams during Cooperative Problem Solving , 1977 .

[2]  Douglas C. Engelbart,et al.  Display-Selection Techniques for Text Manipulation , 1967 .

[3]  Elizabeth Zoltan-Ford,et al.  How to Get People to Say and Type What Computers Can Understand , 1991, Int. J. Man Mach. Stud..

[4]  C. A. Simpson,et al.  System Design for Speech Recognition and Generation , 1985, Human factors.

[5]  Alexander I. Rudnicky,et al.  A Comparison of Speech and Typed Input , 1990, HLT.

[6]  G. D. Weeks,et al.  Word usage in interactive dialog with restricted and unrestricted vocabularies , 1977, IEEE Transactions on Professional Communication.

[7]  Kathryn M. Dobroth,et al.  Automating Services with Speech Recognition over the Public Switched Telephone Network: Human Factors Considerations , 1991, IEEE J. Sel. Areas Commun..

[8]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[9]  Alan F. Newell,et al.  Listening Typewriter Simulation Studies , 1990, Int. J. Man Mach. Stud..

[10]  Philip R. Cohen,et al.  The contributing influence of speech and interaction on human discourse patterns , 1991 .

[11]  John Bear,et al.  Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog , 1992, ACL.

[12]  Gale Martin,et al.  The Utility of Speech Input in User-Computer Interfaces , 1989, Int. J. Man Mach. Stud..

[13]  H. Giles,et al.  Speech Accommodation Theory: The First Decade and Beyond , 1987 .

[14]  Matthias Jarke,et al.  Using Restricted Natural Language for Data Retrieval: A Plan for Field Evaluation , 1982 .

[15]  Michael J. Kelly,et al.  Limited Vocabulary Natural Language Dialogue , 1977, Int. J. Man Mach. Stud..

[16]  Judith E. Harkins,et al.  Speech to Text: Today and Tomorrow. Proceedings of a Conference at Gallaudet University (Washington, D.C., September, 1988). GRI Monograh Series B, No. 2. , 1989 .

[17]  Sharon L. Oviatt,et al.  Integration themes in multimodal human-computer interaction , 1994, ICSLP.

[18]  Ben Shneiderman,et al.  Direct Manipulation: A Step Beyond Programming Languages , 1983, Computer.

[19]  Howard Rheingold,et al.  Virtual Reality , 1991 .

[20]  Jack Mostow,et al.  Towards a Reading Coach that Listens: Automated Detection of Oral Reading Errors , 1993, AAAI.

[21]  Philip R. Cohen The role of natural language in a multimodal interface , 1992, UIST '92.

[22]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[23]  Elizabeth Shriberg,et al.  Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction , 1992, HLT.

[24]  Sharon L. Oviatt,et al.  Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity , 1994, Speech Communication.

[25]  Adele Goldberg,et al.  Personal Dynamic Media , 1977, Computer.

[26]  V Putz-Anderson,et al.  Prevalence and work-relatedness of self-reported carpal tunnel syndrome among U.S. workers: analysis of the Occupational Health Supplement data of 1988 National Health Interview Survey. , 1995, American journal of industrial medicine.

[27]  Wayne H. Ward,et al.  High level knowledge sources in usable speech recognition systems , 1989, CACM.

[28]  Hector J. Levesque,et al.  Confirmations and Joint Action , 1991, IJCAI.

[29]  James D. Hollan,et al.  Direct Manipulation Interfaces , 1985, Hum. Comput. Interact..

[30]  Mitch Weintraub,et al.  Automatic evaluation and training in English pronunciation , 1990, ICSLP.

[31]  Janet M. Baker Large vocabulary speaker-adaptive continuous speech recognition research overview at dragon systems , 1991, EUROSPEECH.

[32]  Jeremy Peckham,et al.  Speech Understanding and Dialogue over the telephone: an overview of the ESPRIT SUNDIAL project. , 1991, HLT.

[33]  Donald Hindle,et al.  Deterministic Parsing of Syntactic Non-fluencies , 1983, ACL.

[34]  C. Raymond Perrault,et al.  A Plan-Based Analysis of Indirect Speech Act , 1980, CL.

[35]  Fernando Pereira,et al.  Toward a spoken language translator for restricted-domain context-free languages , 1991, EUROSPEECH.

[36]  Douglas C. Engelbart Design considerations for knowledge workshop terminals , 1973, AFIPS National Computer Conference.

[37]  Philip R. Cohen The Pragmatics of Referring and the Modality of Communication , 1984, Comput. Linguistics.

[38]  T.B. Martin,et al.  Practical applications of voice input to machines , 1976, Proceedings of the IEEE.

[39]  G. D. Weeks,et al.  Studies in Interactive Communication: I. The Effects of Four Communication Modes on the Behavior of Teams During Cooperative Problem-Solving , 1972 .

[40]  Akira Kurematsu,et al.  Future Perspective of Automatic Telephone Interpretation , 1992 .

[41]  Alphonse Chapanis,et al.  The Effects of 10 Communication Modes on the Behavior of Teams During Co-Operative Problem-Solving , 1974, Int. J. Man Mach. Stud..

[42]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[43]  Robert M. Brady,et al.  The Influence of Speech Rate Stereotypes and Rate Similarity or Listeners' Evaluations of Speakers , 1983 .

[44]  Stephen C. Levinson,et al.  Some pre‐observations on the modelling of dialogue , 1981 .

[45]  Victor Zue,et al.  T]he MIT ATIS System: February 1992 Progress Report , 1992, HLT.

[46]  Linda J. Weldon,et al.  An Experimental Comparison of Natural and Structured Query Languages , 1983 .

[47]  Terry Winograd,et al.  Understanding computers and cognition - a new foundation for design , 1987 .

[48]  James F. Allen,et al.  A Plan Recognition Model for Subdialogues in Conversations , 1987, Cogn. Sci..

[49]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[50]  Mei-Yuh Hwang,et al.  An Overview of the SPHINX-II Speech Recognition System , 1993, HLT.

[51]  Alexander I. Rudnicky Mode preference in a simple data-retrieval task , 1993, HLT.

[52]  François Andry Static and dynamic predictions : a method to improve speech understanding in cooperative dialogues , 1992, ICSLP.

[53]  J. Mariani Spoken Language Processing in the Framework of Human-Machine Communication at LIMSI , 1992, HLT.

[54]  Stefanie Shattuck-Hufnagel,et al.  The Use of Prosody in Syntactic Disambiguation , 1991, HLT.