A computer-simulated parallel formant synthesizer has been used to copy short samples of human speech. It is possible to make the synthetic speech almost indistinguishable from the natural in spectrum, waveform, and by earphone listening, provided that the synthetic glottal pulse is derived by inverse filtering a typical natural vowel from the same talker. Various other pulse shapes have been tried, such as the combination of cosine segments suggested by various workers as a close approximation to human glottal pulses. For producing speech acceptable as natural, none of these idealized pulse shapes has been as successful as those derived by inverse filtering. However, the subjective differences are small compared with the differences that would be caused by reverberation when listening to a loudspeaker in an ordinary room with good acoustics; it has been demonstrated that under such listening conditions, the phase structure of glottal pulses is of no importance.
[1]
Thomas H. Tarnóczy.
Vowel Formant Bandwidths and Synthetic Vowels
,
1962
.
[2]
John Nicholas Holmes,et al.
Speech synthesis
,
1972
.
[3]
A. Oppenheim.
Speech analysis-synthesis system based on homomorphic filtering.
,
1969,
The Journal of the Acoustical Society of America.
[4]
A. Rosenberg.
Effect of glottal pulse shape on the quality of natural vowels.
,
1969
.
[5]
J. Flanagan.
Speech Analysis, Synthesis and Perception
,
1971
.
[6]
L R Rabiner,et al.
Digital-formant synthesizer for speech-synthesis studies.
,
1968,
The Journal of the Acoustical Society of America.