The use of linguistic hierarchies in speech understanding

This paper describes two related systems which provide frameworks for encoding linguistic knowledge into formal rules within the context of a trainable probabilistic model. The first system, TINA [33], drives top-down from sentence level structure, terminating in either words or syllables. Its main purpose is to provide a meaning representation for the sentence. The other system, ANGIE [36], operates bottom-up from phonetic or orthographic units, characterizing the substructure of syllables/words. It provides a framework for both phonological rule modelling and letter-to-sound/sound-to-letter transformations. The two systems logically converge on the syllable or word layer. We have recently been successful in integrating their combined constraint into a recognizer search, achieving considerable improvement in understanding accuracy [9, 23]. In this paper, I will look both toward the past and the future, identifying and motivating the decisions that were made in the design of TINA and ANGIE and the associated rule formalisms, and contemplating various remaining open research issues.

[1]  Margaret King,et al.  Parsing Natural Language , 1983 .

[2]  Victor Zue The use of phonetic rules in automatic speech recognition , 1983, Speech Commun..

[3]  Victor Zue,et al.  Multilingual spoken-language understanding in the MIT Voyager system , 1995, Speech Commun..

[4]  William A. Woods,et al.  Computational Linguistics Transition Network Grammars for Natural Language Analysis , 2022 .

[5]  Goopeel Chung Hierarchical Duration Modelling for a Speech Recognition System , 1997 .

[6]  Victor Zue,et al.  Reversible letter-to-sound/sound-to-letter generation based on parsing word morpology , 1993, Speech Commun..

[7]  James R. Glass,et al.  Telephone-based conversational speech recognition in the JUPITER domain , 1998, ICSLP.

[8]  Joseph Polifroni,et al.  A new restaurant guide conversational system: issues in rapid prototyping for specialized domains , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Stephanie Seneff,et al.  ANGIE: a new framework for speech analysis based on morpho-phonological modelling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Victor Zue,et al.  WHEELS: a conversational system in the automobile classifieds domain , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Mark A. Randolph,et al.  Syllable-based constraints on properties of English sounds , 1989 .

[12]  Lori Lamel,et al.  Speaker-independent continuous speech dictation , 1993, Speech Communication.

[13]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[14]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Wayne H. Ward Understanding spontaneous speech: the Phoenix system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Stephanie Seneff,et al.  Improvements in speech understanding accuracy through the integration of hierarchical linguistic, prosodic, and phonological constraints in the jupiter domain , 1998, ICSLP.

[17]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[18]  A. Nadas,et al.  Estimation of probabilities in the language model of the IBM speech recognition system , 1984 .

[19]  Victor Zue,et al.  YINHE: a Mandarin Chinese version of the GALAXY system , 1997, EUROSPEECH.

[20]  David Goodine,et al.  A French version of the MIT-ATIS system: portability issues , 1993, EUROSPEECH.

[21]  Noam Chomsky,et al.  Lectures on Government and Binding@@@Some Concepts and Consequences of the Theory of Government and Binding , 1984 .

[22]  Victor Zue,et al.  PEGASUS: A spoken dialogue interface for on-line air travel planning , 1994, Speech Communication.

[23]  Hy Murveit,et al.  Linguistic constraints in hidden Markov model based speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[24]  Lotfi A. Zadeh,et al.  Phonological structures for speech recognition , 1989 .

[25]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[26]  Stephanie Seneff,et al.  Providing sublexical constraints for word spotting within the ANGIE framework , 1997, EUROSPEECH.

[27]  George A. Miller,et al.  Nouns in WordNet: A Lexical Inheritance System , 1990 .

[28]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[29]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[30]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[31]  Stephanie Seneff,et al.  Automated English-Korean Translation for Enhanced Coalition Communications , 1997 .

[32]  Stephanie Seneff,et al.  Hierarchical duration modelling for speech recognition using the ANGIE framework , 1997, EUROSPEECH.

[33]  Stephanie Seneff,et al.  Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation , 1994, HLT.

[34]  Raymond Lau,et al.  Subword lexical modelling for speech recognition , 1998 .

[35]  Victor Zue,et al.  From interface to content: translingual access and delivery of on-line information , 1997, EUROSPEECH.

[36]  Victor Zue,et al.  Language modelling for recognition and understanding using layered bigrams , 1992, ICSLP.

[37]  Aarati D. Parmar A semic-automatic system for the syllabification and stress assignment of large lexicons , 1997 .

[38]  Ray Jackendoff Semantics and Cognition , 1983 .

[39]  Stephanie Sene Robust Parsing for Spoken Language Systems , 1992 .

[40]  Kenneth Ward Church Phrase-structure parsing: a method for taking advantage of allophonic constraints , 1983 .