A novel neural-based model for acoustic-articulatory inversion mapping

In this paper, a new bidirectional neural network for better acoustic-articulatory inversion mapping is proposed. The model is motivated by the parallel structure of human brain, processing information by having forward--reverse connections. In other words, there would be a feedback from articulatory system to the acoustic signals emitted from that organ. Inspired by this mechanism, a new bidirectional model is developed to map speech representations to articulatory features. Formation of attractor dynamics in such bidirectional model is first carried out by training the reference speaker subspace as the continuous attractor. Then, it is used to recognize the other speaker’s speech. In fact, the structure and training of this bidirectional model is designed in such a way that the network learns to denoise the signal step by step, using properties of attractors it has formed. In this work, the efficiency of a nonlinear feedforward network is compared to the same one with a bidirectional connection. The bidirectional model increases the accuracy up to approximately 3% (from 62.09 to 64.91%) in the phone recognition process.

[1]  R. Zafarani,et al.  A New Bidirectional Neural Network for Lexical Modeling and Speech Recognition Improvement , 2007 .

[2]  Korin Richmond,et al.  Trajectory Mixture Density Networks with Multiple Mixtures for Acoustic-Articulatory Inversion , 2007, NOLISP.

[3]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Seyyed Ali Seyyedsalehi,et al.  Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks , 2009, Neural Computing and Applications.

[5]  Simon King,et al.  Modelling the uncertainty in recovering articulation from acoustics , 2003, Comput. Speech Lang..

[6]  L Deng,et al.  Structural design of hidden Markov model speech recognizer using multivalued phonetic features: comparison with segmental speech units. , 1992, The Journal of the Acoustical Society of America.

[7]  Korin Richmond A multitask learning perspective on acoustic-articulatory inversion , 2007, INTERSPEECH.

[8]  Korin Richmond,et al.  A trajectory mixture density network for the acoustic-articulatory inversion mapping , 2006, INTERSPEECH.

[11]  Maya L. Henry,et al.  Phonological dyslexia and dysgraphia: Cognitive mechanisms and neural substrates , 2009, Cortex.

[12]  Le Zhang,et al.  Acoustic-Articulatory Modeling With the Trajectory HMM , 2008, IEEE Signal Processing Letters.

[13]  V. Gracco,et al.  Accurate recovery of articulator positions from acoustics: new conclusions based on human data. , 1996, The Journal of the Acoustical Society of America.

[14]  Keiichi Tokuda,et al.  Acoustic-to-articulatory inversion mapping with Gaussian mixture model , 2004, INTERSPEECH.

[15]  Alan Wrench,et al.  Continuous speech recognition using articulatory data , 2000, INTERSPEECH.

[16]  Alan A Wrench,et al.  A MULTI-CHANNEL/MULTI-SPEAKER ARTICULATORY DATABASE FOR CONTINUOUS SPEECH RECOGNITION RESEARCH , 2000 .

[17]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  Christopher T Kello,et al.  A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. , 2004, The Journal of the Acoustical Society of America.

[19]  K. Heilman,et al.  Developmental dyslexia: A motor‐articulatory feedback hypothesis , 1996, Annals of neurology.

[20]  Improving phonological dyslexia using electrical stimulation in the articulatory system. , 2010 .

[21]  M. Coltheart,et al.  Varieties of developmental dyslexia , 1993, Cognition.

[22]  Simon King,et al.  An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces , 2000, INTERSPEECH.