Automatically generated word pronunciations from phoneme classifier output

An automatic procedure for modeling alternative pronunciations of words produced by different talkers is described. The research compared recognition performance on forty city and state names using three different representations of each word. In the first case, the expected pronunciation(s) of each word was produced by an expert. In the second case, a dynamic programming algorithm was used to create a pronunciation network for each word by combining phonetic transcriptions from ten utterances of the word produced by human labelers. The third case was identical to the second, except that the phonetic labels were provided automatically by a phonetic recognition algorithm. On a test set of words produced by new speakers, equivalent recognition performance was observed for the pronunciation networks derived from human and machine labels. Both produced performance superior to that obtained with the pronunciations produced by the expert.<<ETX>>

[1]  Ronald A. Cole,et al.  A telephone speech database of spelled and spoken names , 1992, ICSLP.

[2]  Lotfi A. Zadeh,et al.  Phonological structures for speech recognition , 1989 .

[3]  Andrej Ljolje,et al.  Recognizing phonemes vs. recognizing phones: a comparison , 1992, ICSLP.

[4]  Ronald A. Cole,et al.  City name recognition over the telephone , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hy Murveit,et al.  Lexical access with lattice input , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Enrique Vidal,et al.  Learning accurate finite-state structural models of words through the ECGI algorithm , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[7]  Victor Zue,et al.  The MIT SUMMIT Speech Recognition System: A Progress Report , 1989, HLT.