Rescoring multiple pronunciations generated from spelled words

Building on earlier work [2], we show how a set of binary decision trees grown by means of the Gelfand-Ravishankar-Delp algorithm [8] can be trained to generate an ordered list of possible pronunciations from a spelled word. Training is carried out on a database consisting of spelled words paired with their pronunciations (in a particular language). We show how phonotactic information can be learned by a second set of decision trees, which reorder the multiple pronunciations generated by the first set. The paper defines the “inclusion” metric for scoring phoneticizers that generate multiple pronunciations. Experimental results employing this metric indicate that phonotactic reordering yields a slight improvement when only the top pronunciation is retained, and a large improvement when more than one hypothesis is retained. Isolated-word recognition results which show good performance for automatically-generated pronunciations are given.

[1]  Xuedong Huang,et al.  Improvements on a trainable letter-to-sound converter , 1997, EUROSPEECH.

[2]  Joseph Picone,et al.  An advanced system to generate pronunciations of proper nouns , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Paul Dalsgaard,et al.  Multi-lingual testing of a self-learning approach to phonemic transcription of orthography , 1995, EUROSPEECH.

[4]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[5]  Lori Lamel,et al.  On designing pronunciation lexicons for large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Robert I. Damper,et al.  A recurrent network that learns to pronounce English text , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Edward J. Delp,et al.  An iterative growing and pruning algorithm for classification tree design , 1989, Conference Proceedings., IEEE International Conference on Systems, Man and Cybernetics.

[8]  Elmar Nöth,et al.  Comparison of two tree-structured approaches for grapheme-to-phoneme conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Yoshinori Sagisaka,et al.  Automatic generation of a pronunciation dictionary based on a pronunciation network , 1997, EUROSPEECH.

[10]  François Yvon Prononcer par analogie : motivation, formalisation et evaluation , 1996 .