Stochastic calculus, non-linear filtering, and the internal model principle: implications for articulatory speech recognition

A stochastic approach to modelling speech production and perception is discussed, based on Itô calculus. Speech is modelled by a system of non-linear stochastic differential equations evolving on a finite-dimensional state space, representing a partiallyobserved Markov process. The optimal non-linear filtering equations for the model are stated, and shown to exhibit a predictorcorrector structure, which mimics the structure of the original system. This is used to suggest a possible justification for the hypothesis that speakers and listeners make use of an “internal model” in producing and perceiving speech, and leads to a useful statistical framework for articulatory speech recognition.

[1]  Frank H. Guenther,et al.  Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models , 1997, Speech Commun..

[2]  H. Kunita,et al.  Stochastic differential equations for the non linear filtering problem , 1972 .

[3]  Michael I. Jordan,et al.  Goal-based speech motor control: A theoretical framework and some preliminary data , 1995 .

[4]  F H Guenther,et al.  Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. , 1995, Psychological review.

[5]  James Lubker,et al.  Formant frequencies of some fixed-mandible vowels and a model of speech motor programming by predict , 1977 .

[6]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[7]  Gordon Ramsay,et al.  A non-linear filtering approach to stochastic training of the articulatory-acoustic mapping using the EM algorithm , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  M. Zakai On the optimal filtering of diffusion processes , 1969 .

[9]  Kam L. Wong Analysis or synthesis , 1985 .

[10]  Ioannis Karatzas,et al.  Brownian Motion and Stochastic Calculus , 1987 .

[11]  Marco Saerens,et al.  A continuous-time dynamic formulation of Viterbi algorithm for one-Gaussian-per-state hidden Markov models , 1993, Speech Commun..

[12]  L. Deng,et al.  A stochastic framework for articulatory speech recognition , 1994 .

[13]  E. Wong,et al.  Stochastic Processes in Engineering Systems , 1984 .

[14]  Gérard Bailly,et al.  Formant trajectories as audible gestures: An alternative for speech synthesis , 1991 .