Effect of intra-phrase position on acceptability of change in segment duration in sentence speech

Abstract For use as a naturalness criterion for duration rules in speech synthesis, human acceptability of change in segment duration is investigated with regard to the temporal position within a phrase. Three perceptual experiments are carried out to introduce variations in the attribute and context of a phrase in sentence speech: (1) the length of a phrase and the type of a phrase accent (2 lengths × 3 types), (2) variation in carrier sentence (3 carriers + 1 without carrier), and (3) the position of a phrase in a breath group (two positions). In total, 22 listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Overall results show that a duration change in the phrase-initial segment is generally the least acceptable and that in the phrase-final segment the most acceptable, with that in a phrase at intermediate positions in between. This position-dependent tendency is observed regardless of the variations in phrase length, accent type, carrier sentence, presence of carrier sentence, and position in a breath group. These results suggest that the error criteria of duration modeling should be reconsidered by taking into account such perceptual characteristics in order to improve temporal naturalness in synthesized speech.

[1]  Björn Lindblom,et al.  Frontiers of speech communication research , 1979 .

[2]  K B Snell,et al.  Duration discrimination of speech and tonal complex stimuli by normally hearing and hearing-impaired listeners. , 1988, The Journal of the Acoustical Society of America.

[3]  B.E.F. Lindblom,et al.  Some Temporal Regularities of Spoken Swedish , 1975 .

[4]  Dennis H. Klatt,et al.  Perception of Segment Duration in Sentence Contexts , 1975 .

[5]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[6]  Seiichiro Namba,et al.  Program for calculating loudness according to DIN 45631 (ISO 532B). , 1991 .

[7]  Natsuko Tsujimura,et al.  An Introduction to Japanese Linguistics , 1997 .

[8]  Yoshinori Sagisaka,et al.  Effects of phoneme class and duration on the acceptability of temporal modifications in speech. , 2002, The Journal of the Acoustical Society of America.

[9]  A. Cohen,et al.  Structure and Process in Speech Perception , 1975 .

[10]  Y. Tohkura,et al.  Speech, Perception, Production and Linguistic Structure , 1992 .

[11]  R. Carlson,et al.  A Search for Durational Rules in a Real-Speech Data Base , 1986 .

[12]  Y. Sagisaka,et al.  Acceptability for temporal modification of consecutive segments in isolated words. , 1997, The Journal of the Acoustical Society of America.

[13]  James G. Martin On judging pauses in spontaneous speech , 1970 .

[14]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[15]  D. Klatt Vowel Lengthening is Syntactically Determined in a Connected Discourse. , 1975 .

[16]  Y. Sagisaka,et al.  Acceptability for temporal modification of single vowel segments in isolated words. , 1998, The Journal of the Acoustical Society of America.

[17]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[18]  Hisashi Kawai,et al.  Control of phoneme duration based on the movement of speech organs , 1993 .

[19]  Minoru Tsuzaki,et al.  Discrimination of empty duration in the click sequence simulating a mora structure , 1994 .

[20]  Yoshinori Sagisaka,et al.  On sentence-level factors governing segmental duration in Japanese , 1989 .