Statistical modeling of phonological rules through linguistic hierarchies

Abstract This paper describes our research aimed at acquiring a generalized probability model for alternative phonetic realizations in conversational speech. For all of our experiments, we utilize the summit landmark-based speech recognition framework. The approach begins with a set of formal context-dependent phonological rules, applied to the baseforms in the recognizer’s lexicon. A large speech corpus is phonetically aligned using a forced recognition procedure. The probability model is acquired by observing specific realizations expressed in these alignments. A set of context-free rules is used to parse words into substructure, in order to generalize context-dependent probabilities to other words that share the same sub-word context. The model maps phones to sub-word units probabilistically in a finite state transducer framework, capturing phonetic predictions based on local phonemic, morphologic, and syllabic contexts. We experimented within two domains: the mercury flight reservation domain and the jupiter weather domain. The baseline system used the same set of phonological rules for lexical expansion, but with no probabilities for the alternates. We achieved 14.4% relative reduction in concept error rate for jupiter and 16.5% for mercury .

[1]  Stephanie Seneff,et al.  Dialogue Management in the Mercury Flight Reservation System , 2000 .

[2]  Jean-Pierre Martens,et al.  In search of better pronunciation models for speech recognition , 1999, Speech Commun..

[3]  Hy Murveit,et al.  Linguistic constraints in hidden Markov model based speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Stephanie Seneff,et al.  ANGIE: a new framework for speech analysis based on morpho-phonological modelling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Stephanie Seneff,et al.  Empowering end users to personalize dialogue systems through spoken interaction , 2003, INTERSPEECH.

[6]  James R. Glass,et al.  Segmentation and modeling in segment-based recognition , 1997, EUROSPEECH.

[7]  Grace Chung A three-stage solution for flexible vocabulary speech understanding , 2000, INTERSPEECH.

[8]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[9]  James R. Glass,et al.  Segment-based recognition on the phonebook task: initial results and observations on duration modeling , 2001, INTERSPEECH.

[10]  Steven Greenberg,et al.  Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation , 1999, Speech Commun..

[11]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: A survey of the literature , 1999, Speech Commun..

[12]  Stephanie Seneff,et al.  Automatic Acquisition of Names Using Speak and Spell Mode in Spoken Dialogue Systems , 2003, NAACL.

[13]  Stephanie Seneff,et al.  Integrating speech with keypad input for automatic entry of spelling and pronunciation of new words , 2002, INTERSPEECH.

[14]  Stephanie Seneff,et al.  The use of linguistic hierarchies in speech understanding , 1998, ICSLP.

[15]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Victor Zue The use of phonetic rules in automatic speech recognition , 1983, Speech Commun..

[17]  Don McAllaster,et al.  Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch , 1998, ICSLP.

[18]  James R. Glass,et al.  Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[19]  Lori Lamel,et al.  Speaker-independent continuous speech dictation , 1993, Speech Communication.

[20]  Daniel Jurafsky,et al.  Building multiple pronunciation models for novel words using exploratory computational phonology , 1995, EUROSPEECH.

[21]  James R. Glass,et al.  Telephone-based conversational speech recognition in the JUPITER domain , 1998, ICSLP.

[22]  Timothy J. Hazen,et al.  Pronunciation modeling using a finite-state transducer representation , 2005, Speech Commun..

[23]  Han Shu,et al.  EM training of finite-state transducers and its application to pronunciation modeling , 2002, INTERSPEECH.

[24]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[25]  Stephanie Seneff,et al.  ORION: from on-line interaction to off-line delegation , 2000, INTERSPEECH.

[26]  I. Lee Hetherington,et al.  An efficient implementation of phonological rules using finite-state transducers , 2001, INTERSPEECH.

[27]  Lotfi A. Zadeh,et al.  Phonological structures for speech recognition , 1989 .