Efficient rule scoring for improved grapheme-based lexicons

For many languages, an expert-defined phonetic lexicon may not exist. One popular alternative is the use of a grapheme-based lexicon. However, there may be a significant difference between the orthography and the pronunciation of the language. In our previous work, we proposed a statistical machine translation based approach to improving grapheme-based pronunciations. Without knowledge of true target pronunciations, a phrase table was created where each individual rule improved the likelihood of the training data when applied. The approach improved recognition accuracy, but required significant computational cost. In this work, we propose an improvement that increases the speed of the process by more than 80 times without decreasing recognition accuracy.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[3]  Thad Hughes,et al.  Revisiting graphemes with increasing amounts of data , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Jean-Luc Gauvain,et al.  Acoustic unit discovery and pronunciation generation from a grapheme-based lexicon , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[5]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.

[6]  Paul Deléglise,et al.  Grapheme to phoneme conversion using an SMT system , 2009, INTERSPEECH.

[7]  Lori Lamel,et al.  Pronunciation Variants Across Systems, Languages and Speaking Style , 2007 .

[8]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[9]  François Yvon,et al.  Discriminative training of a phoneme confusion model for a dynamic lexicon in ASR , 2013, INTERSPEECH.

[10]  Mitch Weintraub,et al.  WS96 project report: Automatic learning of word pronunciation from data , 1997 .

[11]  Kai Feng,et al.  Approaches to automatic lexicon learning with limited training examples , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Eric Fosler-Lussier,et al.  Investigating phonetic information reduction and lexical confusability , 2009, INTERSPEECH.

[13]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[14]  Lori Lamel,et al.  Pronunciation variants generation using SMT-inspired approaches , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[16]  James R. Glass,et al.  Learning new word pronunciations from spoken examples , 2010, INTERSPEECH.