Feature-based pronunciation modeling with trainable asynchrony probabilities

We report on ongoing work on a pronunciation model based on explicit representation of the evolution of multiple linguistic feature streams. In this type of model, most pronunciation variation is viewed as the result of asynchrony between features and changes in feature values. We have implemented such a model using dynamic Bayesian networks. In this paper, we extend our previous work with a mechanism for learning feature asynchrony probabilities from data. We present experimental results on a word classification task using phonetic transcriptions of utterances from the Switchboard corpus.

[1]  James R. Glass,et al.  Hidden feature models for speech recognition using dynamic Bayesian networks , 2003, INTERSPEECH.

[2]  Mirjam Wester,et al.  An elitist approach to articulatory-acoustic feature classification , 2001, INTERSPEECH.

[3]  Alan Wrench,et al.  Continuous speech recognition using articulatory data , 2000, INTERSPEECH.

[4]  Timothy J. Hazen,et al.  Pronunciation modeling using a finite-state transducer representation , 2005, Speech Commun..

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Katrin Kirchhoff Syllable-level desynchronisation of phonetic features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Simon King,et al.  Speech recognition via phonetically featured syllables , 1998, ICSLP.

[8]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Andrej Ljolje,et al.  Automatic Generation of Detailed Pronunciation Lexicons , 1996 .

[10]  Li Deng,et al.  Production models as a structural basis for automatic speech recognition , 1997, Speech Commun..

[11]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[12]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[13]  Steven Greenberg,et al.  INSIGHTS INTO SPOKEN LANGUAGE GLEANED FROM PHONETIC TRANSCRIPTION OF THE SWITCHBOARD CORPUS , 1996 .

[14]  James R. Glass,et al.  Feature-based Pronunciation Modeling for Speech Recognition , 2004, HLT-NAACL.

[15]  Don McAllaster,et al.  Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch , 1998, ICSLP.