Use of dialogue, pragmatics and sematics to enhance speech recognition

Abstract Current, state-of-the-art speaker-independent continuous-speech recognizers are able to achieve word recognition rates in excess of 94 percent using lexicons of 1000 words or less and grammars or language models with perplexity 60 or less. Performance of these systems decreases rapidly as the perplexity of the grammar increases. As we allow users the flexibility to speak naturally, using constructions of their own choosing, perplexities increase more than an order of magnitude. Fortunately, knowledge of the domain and of communicative and problem solving behaviors can be used to dynamically decrease perplexity and allow more natural interaction given the current state of speech recognition technology. The perplexity reduction from knowledge results in speech performance equal to that demonstrated by speech recognizers using an equivalently low perplexity language model in the same or different domains. This paper addresses how knowledge of domain semantics, dialog, communication conventions and problem solving behavior are used to enhance automatic speech recognition and understanding. Included is a discussion of the system's basic principles and descriptions of the important knowledge sources and heuristics employed by the minds system. Prior perplexity reduction results are reviewed, demonstrating the system's ability to dynamically reduce perplexity and enhance recognition performance. This is followed by a brief analysis of some of the heuristics which do not have to be reimplemented across domains. Specifically addressed are why the heuristics are effective, and how much each can be expected to reduce entropy and average branching factor in any possible application domain.

[1]  Earl D. Sacerdott Planning in a hierarchy of abstraction spaces , 1973, IJCAI 1973.

[2]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[3]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[4]  Gerald Jay Sussman,et al.  A Computer Model of Skill Acquisition , 1975 .

[5]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[6]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[7]  Jaime G. Carbonell,et al.  A tutorial on techniques and applications for natural language processing , 1983 .

[8]  Daniel M. Russell,et al.  Planning and understanding: A computational approach to human reasoning: R. Wilensky, (Addison-Wesley, Reading, MA, 1983); 157 pages, $25.00 , 1984 .

[9]  Gary G. Hendrix,et al.  Expanding the Utility of Semantic Networks Through Partitioning , 1975, IJCAI.

[10]  Alan W. Biermann,et al.  The Correction of Ill-Formed Input Using History-Based Expectation with Applications to Speech Understanding , 1986, Comput. Linguistics.

[11]  Eugene Charniak,et al.  Motivation Analysis, Abductive Unification, and Nonmonotonic Equality , 1988, Artif. Intell..

[12]  C. Raymond Perrault,et al.  Analyzing Intention in Utterances , 1986, Artif. Intell..

[13]  Marvin Minsky,et al.  A framework for representing knowledge" in the psychology of computer vision , 1975 .

[14]  Earl David Sacerdoti,et al.  A Structure for Plans and Behavior , 1977 .

[15]  C. Raymond Perrault,et al.  Elements of a Plan-Based Theory of Speech Acts , 1979, Cogn. Sci..

[16]  Robert Wilensky,et al.  Understanding Goal-Based Stories , 1978, Outstanding Dissertations in the Computer Sciences.

[17]  John Seely Brown,et al.  MULTIPLE REPRESENTATIONS OF KNOWLEDGE FOR TUTORIAL REASONING , 1975 .

[18]  Wayne H. Ward,et al.  High level knowledge sources in usable speech recognition systems , 1990 .

[19]  Sheryl R. Young,et al.  The MINDS System: Using Context and Dialog to Enhance Speech Recognition , 1989, HLT.