MAGE -A Platform for Tangible Speech Synthesis

In this paper, we describe our pioneering work in developing speech synthesis beyond the Text-To-Speech paradigm. We introduce tangible speech synthesis as an alternate way of envisioning how artificial speech content can be produced. Tangible speech synthesis refers to the ability, for a given system, to provide some physicality and interactivity to important speech production parameters. We present MAGE, our new software platform for high-quality reactive speech synthesis, based on statistical parametric modeling and more particularly hidden Markov models. We also introduce a new HandSketch-based musical instrument. This instrument brings pen and posture based interaction on the top of MAGE, and demonstrates a first proof of concept.

[1]  Sidney S. Fels,et al.  ForTouch: A Wearable Digital Ventriloquized Actor , 2009, NIME.

[2]  David Clark,et al.  High Resolution Subjective Testing Using a Double Blind Comparator , 1981 .

[3]  Thomas Eriksson,et al.  A speech spectrum distortion measure with interframe memory , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Geoffrey E. Hinton,et al.  Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls , 1997, IEEE Trans. Neural Networks.

[5]  Keiichi Tokuda,et al.  Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis , 2004, SSW.

[6]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  H. Brekle,et al.  Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine , 1970 .

[8]  Thierry Dutoit,et al.  HandSketch bi-manual controller: investigation on expressive control issues of an augmented tablet , 2007, NIME '07.

[9]  Perry R. Cook,et al.  squeezeVox: A New Controller for Vocal Synthesis Models , 2000, International Conference on Mathematics and Computing.

[10]  H. Dudley The carrier nature of speech , 1940 .

[11]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  Brian C J Moore,et al.  Introduction. The perception of speech: from sound to meaning , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.