Switching Dynamic System Models for Speech Articulation and Acoustics

A statistical generative model for the speech process is described that embeds a substantially richer structure than the HMM currently in predominant use for automatic speech recognition. This switching dynamic-system model generalizes and integrates the HMM and the piece-wise stationary nonlinear dynamic system (state- space) model. Depending on the level and the nature of the switching in the model design, various key properties of the speech dynamics can be naturally represented in the model. Such properties include the temporal structure of the speech acoustics, its causal articulatory movements, and the control of such movements by the multidimensional targets correlated with the phonological (symbolic) units of speech in terms of overlapping articulatory features.

[1]  P. Denes On the Motor Theory of Speech Perception , 1965 .

[2]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[3]  P. MacNeilage Motor control of serial ordering of speech. , 1970, Psychological review.

[4]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[5]  Raymond D. Kent,et al.  chapter 3 – Models of Speech Production , 1976 .

[6]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[7]  A. Liberman,et al.  The motor theory of speech perception revised , 1985, Cognition.

[8]  George N. Clements,et al.  The geometry of phonological features , 1985, Phonology Yearbook.

[9]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[10]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[11]  L Deng,et al.  Structural design of hidden Markov model speech recognizer using multivalued phonetic features: comparison with segmental speech units. , 1992, The Journal of the Acoustical Society of America.

[12]  Li Deng,et al.  A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..

[13]  L. Deng Design of a feature‐based speech recognizer aiming at integration of auditory processing, signal modeling, and phonological structure of speech , 1993 .

[14]  Li Deng,et al.  Speech recognition using the atomic speech units constructed from overlapping articulatory features , 1994, EUROSPEECH.

[15]  Jont B. Allen How do humans process and recognize speech , 1993 .

[16]  Richard S. McGowan,et al.  Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests , 1994, Speech Commun..

[17]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[18]  M. Randolph Speech analysis based on a model of articulatory behavior , 1994 .

[19]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[20]  Steve J. Young,et al.  Towards improved speech recognition using a speech production model , 1995, EUROSPEECH.

[21]  Michael I. Jordan,et al.  Goal-based speech motor control: A theoretical framework and some preliminary data , 1995 .

[22]  Carol Y. Espy-Wilson,et al.  Speech parameterization based on phonetic features: application to speech recognition , 1995, EUROSPEECH.

[23]  D. Ostry,et al.  The equilibrium point hypothesis and its application to speech motor control. , 1996, Journal of speech and hearing research.

[24]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[25]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .

[26]  J. Perkell Properties of the tongue help to define vowel categories: hypotheses based on physiologically-oriented modeling , 1996 .

[27]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[28]  Hynek Hermansky,et al.  Towards increasing speech recognition error rates , 1995, Speech Commun..

[29]  Li Deng,et al.  Transitional speech units and their representation by regressive Markov states: applications to speech recognition , 1996, IEEE Trans. Speech Audio Process..

[30]  M M Sondhi,et al.  The potential role of speech production models in automatic speech recognition. , 1996, The Journal of the Acoustical Society of America.

[31]  R. S. McGowan,et al.  Acoustic 1996: Speech production parameters for automatic speech recognition , 1997 .

[32]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[33]  Louis C. W. Pols,et al.  Flexible human speech recognition , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[34]  Xuemin Shen,et al.  Maximum likelihood in statistical estimation of dynamic systems: Decomposition algorithm and simulation results , 1997, Signal Process..

[35]  Li Deng,et al.  Production models as a structural basis for automatic speech recognition , 1997, Speech Commun..

[36]  M. Russell,et al.  Progress towards speech models that model speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[37]  Zoubin Ghahramani,et al.  Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[38]  Li Deng,et al.  A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition , 1998, Speech Commun..

[39]  Hamid Sheikhzadeh,et al.  Speech analysis and recognition using interval statistics generated from a composite auditory model , 1998, IEEE Trans. Speech Audio Process..

[40]  Li Deng,et al.  Computational Models for Speech Production , 2018, Speech Processing.

[41]  Li Deng,et al.  A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics , 1999, EUROSPEECH.

[42]  Li Deng,et al.  Computational Models for Auditory Speech Processing , 1999 .

[43]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[44]  Vladimir Pavlovic,et al.  Variational Learning in Mixed-State Dynamic Graphical Models , 1999, UAI.

[45]  Hamid Sheikhzadeh,et al.  A layered neural network interfaced with a cochlear model for the study of speech encoding in the auditory system , 1999, Comput. Speech Lang..

[46]  John S. Bridle,et al.  The HDM: a segmental hidden dynamic model of coarticulation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[47]  Li Deng,et al.  A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech , 2000, Comput. Speech Lang..

[48]  L Deng,et al.  Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics. , 2000, The Journal of the Acoustical Society of America.

[49]  Li Deng,et al.  Data-driven model construction for continuous speech recognition using overlapping articulatory features , 2000, INTERSPEECH.

[50]  Jing Huang,et al.  Multistage coarticulation model combining articulatory, formant and cepstral features , 2000, INTERSPEECH.

[51]  Harriet J. Nock,et al.  Techniques for modelling Phonological Processes in Automatic Speech Recognition , 2001 .

[52]  Terrence J. Sejnowski,et al.  Variational Learning for Switching State-Space Models , 2001 .

[53]  Simon King,et al.  ASR - articulatory speech recognition , 2001, INTERSPEECH.

[54]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[55]  Wendy J. Holmes,et al.  Segmental HMMs: Modeling Dynamics and Underlying Structure in Speech , 2004 .

[56]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[57]  Rainer Martin,et al.  Models of Speech Production and Hearing , 2006 .