论文信息 - Data driven example based continuous speech recognition

Data driven example based continuous speech recognition

The dominant acoustic modeling methodology based on Hidden Markov Models is known to have certain weaknesses. Partial solutions to these flaws have been presented, but the fundamental problem remains: compression of the data to a compact HMM discards useful information such as time dependencies and speaker information. In this paper, we look at pure example based recognition as a solution to this problem. By replacing the HMM with the underlying examples, all information in the training data is retained. We show how information about speaker and environment can be used, introducing a new interpretation of adaptation. The basis for the recognizer is the wellknown DTW algorithm, which has often been used for small tasks. However, large vocabulary speech recognition introduces new demands, resulting in an explosion of the search space. We show how this problem can be tackled using a data driven approach which selects appropriate speech examples as candidates for DTW-alignment.

[1] Lawrence R. Rabiner,et al. On the application of embedded digit training to speaker independent connected digit recognition , 1984 .

[2] Hermann Ney,et al. Look-ahead techniques for fast beam search , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[4] Daniel Povey,et al. Frame discrimination training for HMMs for large vocabulary speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5] Ramesh A. Gopinath,et al. Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[6] Justin Fackrell,et al. Segment selection in the L&h Realspeak laboratory TTS system , 2000, INTERSPEECH.

[7] William D Marslen-Wilson,et al. Processing interactions and lexical access during word recognition in continuous speech , 1978, Cognitive Psychology.

[8] P. Lockwood,et al. DTW schemes for continuous speech recognition: a unified view☆ , 1989 .

[9] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10] Renato De Mori,et al. High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..