Linguistic Knowledge and Empirical Methods in Speech Recognition

Automatic speech recognition is one of the fastest growing and commercially most promising applications of natural language technology. The technology has achieved a point where carefully designed systems for suitably constrained applications are a reality. Commercial systems are available today for such tasks as large-vocabulary dictation and voice control of medical equipment. This article reviews how state-of-the-art speech-recognition systems combine statistical modeling, linguistic knowledge, and machine learning to achieve their performance and points out some of the research issues in the field.

[1]  Frederick Jelinek,et al.  Basic Methods of Probabilistic Context Free Grammars , 1992 .

[2]  Vassilios Digalakis,et al.  Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[4]  Kenneth Ward Church,et al.  A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[5]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[6]  Andreas Stolcke,et al.  Multiple-pronunciation lexical modeling in a speaker independent speech understanding system , 1994, ICSLP.

[7]  Shlomo Nir,et al.  NATO ASI Series , 1995 .

[8]  Mari Ostendorf,et al.  Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses , 1991, HLT.

[9]  Michael Riley,et al.  A statistical model for generating pronunciation networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[11]  Andreas Stolcke,et al.  Structure and performance of a dependency language model , 1997, EUROSPEECH.

[12]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[15]  Vassilios Digalakis,et al.  Combining Knowledge Sources to Reorder N-Best Speech Hypothesis Lists , 1994, HLT.

[16]  Andreas Stolcke,et al.  Using a stochastic context-free grammar as a language model for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  Douglas E. Appelt,et al.  Combining Linguistic and Statistical Knowledge Sources in Natural-Language Processing for ATIS , 1995 .

[19]  Frederick Jelinek,et al.  Up from trigrams! - the struggle for improved language models , 1991, EUROSPEECH.