Effect of speaking rate on the acceptability of change in segment duration

The acceptability of changes in segment duration at different speaking rates is studied to find useful perceptual characteristics for designing an objective naturalness measure in speech synthesis. Based on a series of previous studies on the intra-phrase positional dependency of perceptual acceptability, we investigate three factors: (1) speaking rate, (2) position within a phrase, and (3) presence/absence of a carrier sentence using three-mora (three-syllable) phrases at three rates (fast, normal and slow) with or without a carrier sentence (Experiment 1). Seven listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Moreover, to understand the observed results within a psychophysical or auditory-based framework instead of language-dependent features, we simplify and replicate the temporal structures of the speech stimuli used and investigate the corresponding three factors (Experiment 2). Ten listeners rate the difference between standard and comparison stimuli in which one of the duration was either lengthened or shortened by up to 40 ms. The speech experiment shows that the acceptability for the same amount of absolute change decreased with an increase in speaking rate, i.e., the listeners more sensitively responded to the same absolute duration change when the speaking rate was fast than when it was slow. Similarly, the non-speech experiment shows that the detectability for the same amount of absolute change increased with an increase in tempo. In addition, the speech experiment shows the differences in acceptability declinations due to intra-phrase positions at three speaking rates. Similarly, the non-speech experiment shows the differences in the detectability due to temporal positions at three tempi. These agreements between the speech and non-speech experiments suggest that the two experiments share a common perceptual mechanism in processing temporal differences. On the other hand, the speech experiment shows no consistent tendency of the acceptability declinations due to the presence/absence of a carrier sentence, while the non-speech experiment shows, in several cases, that the presence of a carrier context could lower the detectability.

[1]  Yoshinori Sagisaka,et al.  Acceptability and discrimination threshold for distortion of segmental duration in Japanese words , 1992, ICSLP.

[2]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[3]  R. Carlson,et al.  A Search for Durational Rules in a Real-Speech Data Base , 1986 .

[4]  Yoshinori Sagisaka Modeling and perception of temporal characteristics in speech , 2003 .

[5]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[6]  S. Abel Discrimination of Temporal Gaps , 1971 .

[7]  K B Snell,et al.  Duration discrimination of speech and tonal complex stimuli by normally hearing and hearing-impaired listeners. , 1988, The Journal of the Acoustical Society of America.

[8]  S. Hibi,et al.  Rhythm perception in repetitive sound sequence , 1983 .

[9]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[10]  Yoshinori Sagisaka,et al.  Effects of phoneme class and duration on the acceptability of temporal modifications in speech. , 2002, The Journal of the Acoustical Society of America.

[11]  Ilse Lehiste,et al.  The perception of duration within sequences of four intervals , 1979 .

[12]  Y. Tohkura,et al.  Speech, Perception, Production and Linguistic Structure , 1992 .

[13]  Dennis H. Klatt,et al.  Perception of Segment Duration in Sentence Contexts , 1975 .

[14]  C. Drake,et al.  Tempo sensitivity in auditory sequences: Evidence for a multiple-look model , 1993, Perception & psychophysics.

[15]  Y. Sagisaka,et al.  Acceptability for temporal modification of single vowel segments in isolated words. , 1998, The Journal of the Acoustical Society of America.

[16]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[17]  Natsuko Tsujimura,et al.  An Introduction to Japanese Linguistics , 1997 .

[18]  Y. Sagisaka,et al.  Acceptability for temporal modification of consecutive segments in isolated words. , 1997, The Journal of the Acoustical Society of America.

[19]  Yoshinori Sagisaka,et al.  On sentence-level factors governing segmental duration in Japanese , 1989 .

[20]  Raymond D. Kent,et al.  Speech Perception, Production and Linguistic Structure , 1994 .

[21]  A. Cohen,et al.  Structure and Process in Speech Perception , 1975 .

[22]  Yoshinori Sagisaka,et al.  Functional differences between vowel onsets and offsets in temporal perception of speech: local-change detection and speaking-rate discrimination. , 2003, The Journal of the Acoustical Society of America.

[23]  Minoru Tsuzaki,et al.  Discrimination of empty duration in the click sequence simulating a mora structure , 1994 .

[24]  Björn Lindblom,et al.  Frontiers of speech communication research , 1979 .

[25]  J. Michon,et al.  STUDIES ON SUBJECTIVE DURATION. I. DIFFERENTIAL SENSITIVITY IN THE PERCEPTION OF REPEATED TEMPORAL INTERVALS. , 1964, Acta psychologica.

[26]  S M Abel,et al.  Duration discrimination of noise and tone bursts. , 1972, The Journal of the Acoustical Society of America.

[27]  Yoshinori Sagisaka,et al.  Effect of intra-phrase position on acceptability of change in segment duration in sentence speech , 2002, Speech Commun..

[28]  Hisashi Kawai,et al.  Control of phoneme duration based on the movement of speech organs , 1993 .