Investigation of Gesture Controlled Articulatory Vocal Synthesizer using a Bio-Mechanical Mapping Layer

We have added a dynamic bio-mechanical mapping layer that contains a model of the human vocal tract with tongue muscle activations as input and tract geometry as output to a real time gesture controlled voice synthesizer system used for musical performance and speech research. Using this mapping layer, we conducted user studies comparing controlling the model muscle activations using a 2D set of force sensors with a position controlled kinematic input space that maps directly to the sound. Preliminary user evaluation suggests that it was more difficult to using force input but the resultant output sound was more intelligible and natural compared to the kinematic controller. This result shows that force input is a potentially feasible for browsing through a vowel space for an articulatory voice synthesis system, although further evaluation is required.

[1]  E. Vatikiotis-Bateson,et al.  Developing Physically-Based , Dynamic Vocal Tract Models using ArtiSynth , 2006 .

[2]  Peter Birkholz,et al.  Construction And Control Of A Three-Dimensional Vocal Tract Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Bob Pritchard,et al.  Performance: what does a body know , 2011, CHI EA '11.

[4]  Yohan Payan,et al.  Efficient 3D Finite Element Modeling of a Muscle-Activated Tongue , 2006, ISBMS.

[5]  Uri M. Ascher,et al.  Real-Time Numerical Solution of Webster's Equation on A Nonuniform Grid , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Keikichi Hirose,et al.  Speech Synthesis by Rule. , 1996 .

[7]  Sidney S. Fels,et al.  GRASSP: Gesturally-Realized Audio, Speech and Song Performance , 2006, NIME.

[8]  Xavier Rodet,et al.  On the Choice of Transducer Technologies for Specific Musical Functions , 2000, ICMC.

[9]  Sidney S. Fels,et al.  ForTouch: A Wearable Digital Ventriloquized Actor , 2009, NIME.

[10]  Geoffrey E. Hinton,et al.  Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls , 1997, IEEE Trans. Neural Networks.

[11]  Sidney S. Fels,et al.  SQUEEZY: Extending a Multi-touch Screen with Force Sensing Objects for Controlling Articulatory Synthesis , 2011, NIME.

[12]  Thierry Dutoit,et al.  RAMCESS/handsketch: a multi-representation framework for realtime and expressive singing synthesis , 2007, INTERSPEECH.