论文信息 - Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls

Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls

Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.

Geoffrey E. Hinton | Sidney S. Fels | S. Fels

[1] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[2] D. E. Rumelhart,et al. Learning internal representations by back-propagating errors , 1986 .

[3] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[4] Geoffrey E. Hinton,et al. Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[5] P. Ladefoged. A course in phonetics , 1975 .

[6] Homer Dudley,et al. A Synthetic Speaker , 1939, Science.

[7] Allen Gersho,et al. The Boltzmann Perceptron Network: A Multi-Layered Feed-Forward Network Equivalent to the Boltzmann Machine , 1988, NIPS.

[8] S. Qian,et al. Nonlinear adaptive networks: A little theory, a few applications , 1990 .

[9] Geoffrey E. Hinton,et al. Building adaptive interfaces with neural networks: The glove-talk pilot study , 1990, INTERACT.

[10] D. Broomhead,et al. Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[11] P. M. Hughes,et al. Iterative parameter optimization techniques for parallel-formant encoding of speech , 1989 .

[12] David S. Broomhead,et al. Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[13] H. Brekle,et al. Mechanismus der menschlichen Sprache nebst Beschreibung einer sprechenden Maschine , 1970 .