论文信息 - Dynamic sharings of Gaussian densities using phonetic features

Dynamic sharings of Gaussian densities using phonetic features

This paper describes a way to adapt the recognizer to pronunciation variability by dynamically sharing Gaussian densities across phonetic models. The method is divided in three steps. First, given an input utterance, an HMM recognizer outputs a lattice of the most likely word hypotheses. Then, the canonical pronunciation of each hypothesis is checked by comparing its theoretical phonetic features to those automatically extracted from speech. If the comparisons show that a phoneme of an hypothesis has likely been pronounced differently, its model is transformed by sharing its Gaussian densities with the ones of its possible alternate phone realization(s). Finally, the transformed models are used in a second-pass recognition. Sharings are dynamic because they are automatically adapted to each input speech. Experiments showed a 5.4% relative reduction in word error rate compared to the baseline and a 2.7% compared to a static method.

Kyung-Tak Lee | C. Wellekens | C. Wellekens | Kyung-Tak Lee

[1] Harriet J. Nock,et al. Pronunciation modeling by sharing gaussian densities across phonetic models , 1999, EUROSPEECH.

[2] Noam Chomsky,et al. The Sound Pattern of English , 1968 .

[3] Steven Greenberg,et al. LINGUISTIC DISSECTION OF SWITCHBOARD-CORPUS AUTOMATIC SPEECH RECOGNITION SYSTEMS , 2000 .

[4] Simon King,et al. Detection of phonological features in continuous speech using neural networks , 2000, Comput. Speech Lang..

[5] Steve Young,et al. The HTK book , 1995 .

[6] Helmer Strik,et al. Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[7] Tom Brøndsted,et al. A SPE based distinctive feature composition of the CMU Label Set in the TIMIT database , 1999 .

[8] Kenneth N. Stevens,et al. Applying phonetic knowledge to lexical access , 1995, EUROSPEECH.