Vowel Imitation Using Vocal Tract Model and Recurrent Neural Network

A vocal imitation system was developed using a computational model that supports the motor theory of speech perception. A critical problem in vocal imitation is how to generate speech sounds produced by adults, whose vocal tracts have physical properties (i.e., articulatory motions) differing from those of infants' vocal tracts. To solve this problem, a model based on the motor theory of speech perception, was constructed. Applying this model enables the vocal imitation system to estimate articulatory motions for unexperienced speech sounds that have not actually been generated by the system. The system was implemented by using Recurrent Neural Network with Parametric Bias (RNNPB) and a physical vocal tract model, called Maeda model. Experimental results demonstrated that the system was sufficiently robust with respect to individual differences in speech sounds and could imitate unexperienced vowel sounds.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  G. Hickok,et al.  Auditory–Motor Interaction Revealed by fMRI: Speech, Music, and Working Memory in Area Spt , 2003 .

[3]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[4]  K. Nishinari,et al.  THEOREM OF THE INVARIANT STRUCTURE AND ITS DERIVATION OF SPEECH GESTALT , 2005 .

[5]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[6]  Hideki Kawahara,et al.  Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  G. Rizzolatti,et al.  Speech listening specifically modulates the excitability of tongue muscles: a TMS study , 2002, The European journal of neuroscience.

[8]  Tetsuya Ogata,et al.  Experience Based Imitation Using RNNPB , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[10]  Jun Tani,et al.  Self-organization of behavioral primitives as multiple attractor dynamics: A robot experiment , 2003, IEEE Trans. Syst. Man Cybern. Part A.