Lexical Access for Speech Understanding using Minimum Message Length Encoding

The Lexical Access Problem consists of determining the intended sequence of words corresponding to an input sequence of phonemes (basic speech sounds) that come from a low-level phoneme recognizer. In this paper we present an information-theoretic approach based on the Minimum Message Length Criterion for solving the Lexical Access Problem. We model sentences using phoneme realizations seen in training, and word and part-of-speech information obtained from text corpora. We show results on multiple-speaker, continuous, read speech and discuss a heuristic using equivalence classes of similar sounding words which speeds up the recognition process without significant deterioration in recognition accuracy.

[1]  Chin-Hui Lee,et al.  A frame-synchronous network search algorithm for connected word recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[3]  C. Myers,et al.  A level building dynamic time warping algorithm for connected word recognition , 1981 .

[4]  Roberto Pieraccini,et al.  Strategies for lexical access to very large vocabularies , 1988, Speech Commun..

[5]  Ingrid Zukerman,et al.  Lexical Access using Minimum Message Length Encoding , 1996, PRICAI.

[6]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[7]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[8]  Lori Lamel,et al.  An expert spectrogram reader: A knowledge-based approach to speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Herbert Gish,et al.  Reducing word error rate on conversational speech from the Switchboard corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  David B. Grayden,et al.  Phonemic segmentation of fluent speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[13]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[14]  Hy Murveit,et al.  Lexical access with lattice input , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Rohan A. Baxter,et al.  MML and Bayesianism: similarities and differences: introduction to minimum encoding inference Part , 1994 .

[16]  Jean-Luc Gauvain,et al.  Developments in continuous speech dictation using the ARPA WSJ task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.