The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer

A computer-simulated parallel formant synthesizer has been used to copy short samples of human speech. It is possible to make the synthetic speech almost indistinguishable from the natural in spectrum, waveform, and by earphone listening, provided that the synthetic glottal pulse is derived by inverse filtering a typical natural vowel from the same talker. Various other pulse shapes have been tried, such as the combination of cosine segments suggested by various workers as a close approximation to human glottal pulses. For producing speech acceptable as natural, none of these idealized pulse shapes has been as successful as those derived by inverse filtering. However, the subjective differences are small compared with the differences that would be caused by reverberation when listening to a loudspeaker in an ordinary room with good acoustics; it has been demonstrated that under such listening conditions, the phase structure of glottal pulses is of no importance.