Integrating Experimental Models of Syntax, Phonology, and Accent/Dialect in a Speech Recognizer

As the field of speech understanding matures, and particularly as the quality of front-end and phonetic components improves, researchers have begun to explore ways to add new kinds of language knowledge to the recognition process. Such work includes augmenting recognizers with models of contextual dependencies (Cohen 1989; Phillips et al. 1991), more advanced models of syntax, (Seneff et al. 1992; Kai & Nakagawa 1992), and gender information (Murveit et al. 1991). This new direction is being developed at ICSI in the context of the Berkeley Restaurant Project (BeRP), a medium-vocabulary (1300 word), speaker-independent, spontaneous continuous-speech understanding system. The primary function of BeRP is to serve as a testbed for a number of our speech-related research projects, including robust feature extraction, connectionist speech recognition, automatic induction of multiple-pronunciation lexicons, foreign accent detection and modeling, and the use of advanced language models. The BeRP system functions as a knowledge consultant whose domain is restaurants in the city of Berkeley, California. As a knowledge consultant, it draws inspiration from earlier consultants like VOYAGER (Zue et al. 1991). Users ask spoken language questions of BeRP, which, in a mixed initiative fashion, directs questions to the user and then queries a database of restaurants and gives advice to the user, based on such use criteria as cost, type of food, and location. This paper describes three preliminary experiments in adding new language knowledge to the recognizer BeRP:

[1]  Andreas Stolcke,et al.  The berkeley restaurant project , 1994, ICSLP.

[2]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[3]  David Goddeau,et al.  Using probabilistic shift-reduce parsing in speech recognition systems , 1992, ICSLP.

[4]  Victor Zue,et al.  Language modelling for recognition and understanding using layered bigrams , 1992, ICSLP.

[5]  Andreas Stolcke,et al.  Multiple-pronunciation lexical modeling in a speaker independent speech understanding system , 1994, ICSLP.

[6]  Mitch Weintraub,et al.  Speech Recognition in SRI's Resource Management and ATIS Systems , 1991, HLT.

[7]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Atsuhiko Kai,et al.  A frame-synchronous continuous speech recognition algorithm using a top-down parsing of context-free grammar , 1992, ICSLP.

[9]  Andreas Stolcke,et al.  Precise N-Gram Probabilities From Stochastic Context-Free Grammars , 1994, ACL.

[10]  James Glass,et al.  Integration of speech recognition and natural language processing in the MIT VOYAGER system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  John Cocke,et al.  Probabilistic Parsing Method for Sentence Disambiguation , 1989, IWPT.

[12]  Horacio Franco,et al.  Hybrid neural network/hidden Markov model continuous-speech recognition , 1992, ICSLP.

[13]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[14]  Michael Riley,et al.  A statistical model for generating pronunciation networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[15]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[16]  Kenji Kita,et al.  Incorporating LR parsing into SPHINX , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  David Goodine,et al.  Full integration of speech and language understanding in the MIT spoken language system , 1991, EUROSPEECH.

[18]  H. Bourlard,et al.  Connectionist Speech Recognition: Status and Prospects , 1991 .

[19]  Hy Murveit,et al.  Integrating Speech and Natural-Language Processing , 1989, HLT.

[20]  Lotfi A. Zadeh,et al.  Phonological structures for speech recognition , 1989 .

[21]  John D. Lafferty,et al.  Computation of the Probability of Initial Substring Generation by Stochastic Context-Free Grammars , 1991, Comput. Linguistics.

[22]  Victor Zue,et al.  Modelling Context Dependency in Acoustic-Phonetic and Lexical Representations , 1991, HLT.

[23]  J. Baker Trainable grammars for speech recognition , 1979 .