Modeling and perception of temporal characteristics in speech

This paper describes characteristics of segmental duration control and its computational modeling that we have studied for more than two decades in speech synthesis. These studies not only contribute to prosody control in speech synthesis technology but also give an integrated view of individual temporal characteristics that have been found in phonetic science. The computational model can provide a new tool for analysis by synthesis of temporal characteristics by its prediction capability of assigning segmental duration in unseen contexts. Furthermore, a series of experimental results are shown on perceptual characteristics of duration modifications. These perceptual experiments reveal the context dependency of sensitivity to duration errors and strong correlation between duration errors and loudness that suggests the existence of a language universal temporal perception mechanism.

[1]  Minoru Tsuzaki,et al.  Intensity effect on discrimination of auditory duration flanked by preceding and succeedine tones , 1994 .

[2]  J. V. Santen,et al.  The analysis of contextual effects on segmental duration , 1990 .

[3]  A. Huggins,et al.  Just noticeable differences for segment duration in natural speech. , 1969, The Journal of the Acoustical Society of America.

[4]  M. D. Riley Tree-based modeling of segmental durations , 1992 .

[5]  Yoshinori Sagisaka,et al.  On sentential effects in the control of segmental duration in Japanese , 1988 .

[6]  Jan P. H. van Santen,et al.  Contextual effects on vowel duration , 1992, Speech Commun..

[7]  H. Fujisaki,et al.  Auditory Perception of Duration of Speech and Non-Speech Stimuli , 1975 .

[8]  Yoshinori Sagisaka,et al.  Statistical modelling of speech segment duration by constrained tree regression , 2000 .

[9]  Yoshinori Sagisaka,et al.  Pause characteristics and local phrase-dependency structure in Japanese , 1992, ICSLP.

[10]  Y. Sagisaka,et al.  Acceptability for temporal modification of consecutive segments in isolated words. , 1997, The Journal of the Acoustical Society of America.

[11]  Y. Sagisaka,et al.  Acceptability for temporal modification of single vowel segments in isolated words. , 1998, The Journal of the Acoustical Society of America.

[12]  A. Huggins,et al.  On the perception of temporal phenomena in speech. , 1972, The Journal of the Acoustical Society of America.

[13]  Nick Campbell A Study of Japanese Speech Timing from the Syllable Perspective( Contrasting English and Japanese Phonetics) , 1999 .

[14]  K B Snell,et al.  Duration discrimination of speech and tonal complex stimuli by normally hearing and hearing-impaired listeners. , 1988, The Journal of the Acoustical Society of America.

[15]  Yoshinori Sagisaka,et al.  Effects of phoneme class and duration on the acceptability of temporal modifications in speech. , 2002, The Journal of the Acoustical Society of America.

[16]  Rolf Carlson,et al.  Perception of Segmental Duration , 1975 .

[17]  H. Fujisaki,et al.  Temporal organization of segmental features in Japanese disyllables , 1980 .