When individuals with speaking disabilities, dysarthrics, try to communicate using speech, they often have to use speech synthesizers which require them to type word symbols or sound symbols. This input method often makes realtime operations difficult and dysarthric users fail to control the flow of conversation. In this study, we are developing a new and novel speech synthesizer where not symbol inputs but hand motions are used to generate speech. In recent years, statistical voice conversion techniques have been proposed based on space mapping between given parallel data sequences. By applying these methods, a hand space and a vowel space is mapped and a converter from hand motions to vowel transitions is developed. It has been reported that the proposed method is effective enough to generate Japanese five vowels. In this paper, we discuss expansion of this system to consonant generation and pitch control. For the former, two methods are examined: waveform concatenation and space mapping for consonant sounds are discussed. For the latter, pitch control is realized using posture of the arm measured by a magnetic sensor.
[1]
Ying Wu,et al.
Analyzing and capturing articulated hand motion in image sequences
,
2005,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2]
Eric Moulines,et al.
Continuous probabilistic transform for voice conversion
,
1998,
IEEE Trans. Speech Audio Process..
[3]
Hideki Kawahara,et al.
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
,
1999,
Speech Commun..
[4]
Keikichi Hirose,et al.
Speech generation from hand gestures based on space mapping
,
2009,
INTERSPEECH.