STRAIGHT: A new speech synthesizer for vowel formant discrimination

The present study investigated whether a new tool for nearly natural speech synthesis, STRAIGHT [Kawahara et al., Speech Commun. 27, 187–207 (1999)], could be used for fine manipulation of vowel formants, using a psychophysical test of formant discrimination. Thresholds for formant discrimination of F1 and F2 for an /ɛ/ vowel, originally synthesized by the KLTSYN [Klatt, J. Acoust. Soc. Am. 67, 971–995 (1980)] and then resynthesized by STRAIGHT, were estimated. Thresholds for vowels generated by KLTSYN and by STRAIGHT were not significantly different. This result validates that STRAIGHT resynthesis can finely manipulate formant frequencies from natural speech for use in speech perception experiments.

[1]  H. Levitt Transformed up-down methods in psychoacoustics. , 1971, The Journal of the Acoustical Society of America.

[2]  D B Pisoni,et al.  Comprehension of Synthetic Speech Produced by Rule: Word Monitoring and Sentence-by-Sentence Listening Times , 1991, Human factors.

[3]  D Kewley-Port Vowel formant discrimination II: Effects of stimulus uncertainty, consonantal context, and training. , 2001, The Journal of the Acoustical Society of America.

[4]  D B Pisoni,et al.  Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.

[5]  J. Hawks Difference limens for formant patterns of vowel sounds. , 1994, The Journal of the Acoustical Society of America.

[6]  C. Watson,et al.  Formant-frequency discrimination for isolated English vowels. , 1994, The Journal of the Acoustical Society of America.

[7]  R. B. Monsen,et al.  The accuracy of formant frequency measurements: a comparison of spectrographic analysis and linear prediction. , 1983, Journal of speech and hearing research.

[8]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[9]  D Kewley-Port,et al.  Vowel formant discrimination: towards more ordinary listening conditions. , 1999, The Journal of the Acoustical Society of America.

[10]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..