Effects of phoneme class and duration on the acceptability of temporal modifications in speech.

Human subjective acceptability of durational distortions in speech segments or portions is significantly affected by various segmental and sequential properties, e.g., the vowel color and temporal position in a word [Kato et al., J. Acoust. Soc. Am. 101, 2311-2322 (1997); 104, 540-549 (1998)]. The current study focused on the effects of phoneme class and original duration of speech portions in isolated words. In experiment 1, the effect of four classes of phoneme, i.e., vowel, nasal, voiceless fricative, and silent closure, on the acceptable modification range was tested. Six listeners evaluated the temporal acceptability of each of 49 words where one of the steady-state portions was subjected to durational modification from -75 ms (for shortening) to +75 ms (for lengthening) in 7.5-ms steps. The results showed that the listeners' acceptable modification ranges were narrowest for vowels, and widest for voiceless fricatives and silent closures, with nasals in between. The mean acceptable ranges for the least vulnerable phoneme class, i.e., voiceless fricative and silent closure, reached 143% or more of that for the most vulnerable class, i.e., vowel. The observed variation in the acceptable modification range due to the different phoneme class was highly correlated with the inherent loudness in each phoneme class. A larger inherent loudness yielded a narrower acceptable range. Experiment 2 tested the effect of the original, as produced, duration of steady-state speech portions using 30 words where the factors of phoneme class and original duration were designed in a factorial way. The results showed that the original durations affected the listeners' absolute acceptable ranges; the ranges were narrower for shorter original durations. There was a significant interaction between the factors of phoneme class and original duration. The effect of the original duration was larger for vowels than for fricatives. This interaction could be accounted for by the difference in the temporal structure spanning beyond the modified portion itself.

[1]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[2]  Minoru Tsuzaki,et al.  Intensity effect on discrimination of auditory duration flanked by preceding and succeedine tones , 1994 .

[3]  L. Allan The perception of time , 1979 .

[4]  G. Allen The Location of Rhythmic Stress Beats in English: an Experimental Study I , 1972, Language and speech.

[5]  Yoshinori Sagisaka,et al.  On sentence-level factors governing segmental duration in Japanese , 1989 .

[6]  S M Abel,et al.  Duration discrimination of noise and tone bursts. , 1972, The Journal of the Acoustical Society of America.

[7]  Alfred B. Kristofferson,et al.  Psychophysical theories of duration discrimination , 1974 .

[8]  F. M. Henry,et al.  Discrimination of the duration of a sound. , 1948, Journal of experimental psychology.

[9]  A. Huggins,et al.  Just noticeable differences for segment duration in natural speech. , 1969, The Journal of the Acoustical Society of America.

[10]  J. Morton,et al.  Perceptual centers (P-centers). , 1976 .

[11]  Y. Sagisaka,et al.  Acceptability for temporal modification of consecutive segments in isolated words. , 1997, The Journal of the Acoustical Society of America.

[12]  Minoru Tsuzaki,et al.  Discrimination of empty duration in the click sequence simulating a mora structure , 1994 .

[13]  R. Carlson,et al.  A Search for Durational Rules in a Real-Speech Data Base , 1986 .

[14]  T. Rammsayer,et al.  Effects of practice and signal energy on duration discrimination of brief auditory intervals , 1994, Perception & psychophysics.

[15]  Hisashi Kawai,et al.  Control of phoneme duration based on the movement of speech organs , 1993 .

[16]  K B Snell,et al.  Duration discrimination of speech and tonal complex stimuli by normally hearing and hearing-impaired listeners. , 1988, The Journal of the Acoustical Society of America.

[17]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[18]  Seiichiro Namba,et al.  Program for calculating loudness according to DIN 45631 (ISO 532B). , 1991 .

[19]  R. Jaenisch,et al.  Chromosomal mapping of four different integration sites of Moloney murine leukemia virus including the locus for alpha 1(I) collagen in mouse. , 1986, Cytogenetics and cell genetics.

[20]  Katarina Bartkova,et al.  A model of segmental duration for speech synthesis in French , 1987, Speech Commun..

[21]  Y. Sagisaka,et al.  Acceptability for temporal modification of single vowel segments in isolated words. , 1998, The Journal of the Acoustical Society of America.

[22]  C. Douglas Creelman,et al.  Human Discrimination of Auditory Duration , 1962 .