论文信息 - Acoustic unit discovery and pronunciation generation from a grapheme-based lexicon

Acoustic unit discovery and pronunciation generation from a grapheme-based lexicon

We present a framework for discovering acoustic units and generating an associated pronunciation lexicon from an initial grapheme-based recognition system. Our approach consists of two distinct contributions. First, context-dependent grapheme models are clustered using a spectral clustering approach to create a set of phone-like acoustic units. Next, we transform the pronunciation lexicon using a statistical machine translation-based approach. Pronunciation hypotheses generated from a decoding of the training set are used to create a phrase-based translation table. We propose a novel method for scoring the phrase-based rules that significantly improves the output of the transformation process. Results on an English language dataset demonstrate the combined methods provide a 13% relative reduction in word error rate compared to a baseline grapheme-based system. Our approach could potentially be applied to low-resource languages without existing lexicons, such as in the Babel project.

[1] Xiuyang Yu,et al. What kind of pronunciation variation is hard for triphones to model? , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2] Tanja Schultz,et al. Grapheme based speech recognition , 2003, INTERSPEECH.

[3] Kai Feng,et al. Approaches to automatic lexicon learning with limited training examples , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Aren Jansen,et al. Towards Unsupervised Training of Speaker Independent Acoustic Models , 2011, INTERSPEECH.

[5] Paul Deléglise,et al. Grapheme to phoneme conversion using an SMT system , 2009, INTERSPEECH.

[6] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[7] Byung-Jun Yoon,et al. A Novel Low-Complexity HMM Similarity Measure , 2011, IEEE Signal Processing Letters.

[8] Steve Young,et al. The HTK book , 1995 .

[9] Hermann Ney,et al. Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Sanjeev Khudanpur,et al. Unsupervised Learning of Acoustic Sub-word Units , 2008, ACL.

[11] James R. Glass,et al. A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.

[12] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[13] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14] Martin Kay,et al. Regular Models of Phonological Rule Systems , 1994, CL.

[15] W. Marsden. I and J , 2012 .

[16] Mari Ostendorf,et al. Joint lexicon, acoustic unit inventory and model design , 1999, Speech Commun..

[17] Lori Lamel,et al. Pronunciation variants generation using SMT-inspired approaches , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] José Carlos Príncipe,et al. Closed-form cauchy-schwarz PDF divergence for mixture of Gaussians , 2011, The 2011 International Joint Conference on Neural Networks.

[19] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..