Data-driven model construction for continuous speech recognition using overlapping articulatory features

A new, data-driven approach to deriving overlapping articulatory-feature based HMMs for speech recognition is presented in this paper. This approach uses speech data from University of Wisconsin's Microbeam X-ray Speech Production Database. Regression tree models were created for constructing HMMs. Use of actual articulatory data improves upon our previous rule-based feature overlapping system. The regression trees allow construction of the HMM topology for an arbitrary utterance given its phonetic transcription and some prosodic information. Experimental results in ASR show preliminary success of this approach.

[1]  Shigeru Kiritani,et al.  X-ray microbeam method for measurement of articulatory dynamics-techniques and results , 1986, Speech Commun..

[2]  Li Deng,et al.  Speech recognition using the atomic speech units constructed from overlapping articulatory features , 1994, EUROSPEECH.

[3]  J. Perkell,et al.  Invariance and variability in speech processes , 1987 .

[4]  Li Deng,et al.  Use of high-level linguistic constraints for constructing feature-based phonological model in speech recognition , 1998, ICSLP.

[5]  Li Deng Autosegmental Representation of Phonological Units of Speech and its Phonetic Interface , 1997 .

[6]  Li Deng,et al.  Speech recognition using autosegmental representation of phonological units with interface to the trended HMM , 1997, Speech Commun..

[7]  Steven Bird,et al.  Computational phonology: A constraint-based approach , 1995, CL.

[8]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[9]  Douglas M. Hawkins,et al.  FIRM: Formal Inference-Based Recursive Modeling , 1991 .

[10]  Louis Goldstein,et al.  Articulatory gestures as phonological units , 1989, Phonology.

[11]  J. Goldsmith Autosegmental and Metrical Phonology , 1990 .

[12]  Kenneth Ward Church Phonological parsing in speech recognition , 1987 .

[13]  Li Deng,et al.  Integrated-multilingual speech recognition using universal phonological features in a functional speech production model , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.