Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition

In this paper we propose to use a decision tree based on graphemic acoustic sub-word units together with phonetic questions. We also show that automatic question generation can be used to completely eliminate any manual effort.

[1]  Robert L. Mercer,et al.  An information theoretic approach to the automatic determination of phonemic baseforms , 1984, ICASSP.

[2]  Lalit R. Bahl,et al.  Continuous parameter acoustic processing for recognition of a natural speech corpus , 1981, ICASSP.

[3]  Franz Kummert,et al.  Grapheme based speech recognition for large vocabularies , 2000, INTERSPEECH.

[4]  K. Kohler Einführung in die Phonetik des Deutschen , 1981 .

[5]  Hermann Ney,et al.  The RWTH large vocabulary continuous speech recognition system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Hermann Ney,et al.  State tying for context dependent phoneme models , 1997, EUROSPEECH.

[7]  Hsiao-Wuen Hon,et al.  Vocabulary-independent speech recognition: the Vocind System , 1992 .

[8]  Heinrich Niemann,et al.  Automatic speech recognition without phonemes , 1993, EUROSPEECH.

[9]  Richard M. Stern,et al.  Automatic generation of phone sets and lexical transcriptions , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Hermann Ney,et al.  Automatic question generation for decision tree based state tying , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Juha Häkkinen,et al.  Decision tree based text-to-phoneme mapping for speech recognition , 2000, INTERSPEECH.

[12]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[13]  Kari Torkkola An efficient way to learn English grapheme-to-phoneme rules automatically , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.