ANALYSIS OF SPEECH AT DIFFERENT SPEAKING RATES USING EXCITATION SOURCE INFORMATION

When humans modify speaking rate they do not perform a simple expansion or compression of the speech signal. In order to maintain the intelligi bility and naturalness of the speech, they modify some of the characteristics of the speec h production mechanism in a complex way. This causes the acoustic features extracted f rom the speech signal to change in a complex way. These changes a ffect the performance of speech systems like speech recognition, speaker recognition etc. Most of the st udies on the e ffect of speaking rate on acoustic features focus on features at segmental and suprasegmental level. The present work focuses on analysis of the e ffects of speaking rate on features at subsegmental level. Three features at subsegmental level, namely , instantaneous fundamental frequency, strength of excitation of epoch and perceived lo u ness are chosen, and their variation with speaking rate is studied. It was observed that instantaneous fundamental frequency i ncreases with increase in speaking rate, but when speaking rate is decreased the chang es in the instantaneous fundamental frequency is speaker-specific. Similar observati ons have been found in the case of strength of excitation at epoch. Strength of excitation d ecreases with increase in speaking rate, and the change in strength of excitation is speaker -sp cific when speaking rate is decreased. The e ff ct of speaking rate on the perception of loudness is also stu died using perceptual loudness studies. It was observed that fas t speech was perceived louder than normal speech for majority of speakers, whereas the di ff rence between perception of loudness of normal and slow speech is speaker-specific. It was also observed that speaking rate does not have significant a ffect on objective loudness measure. A modified measure of loudness in the case of speech at di fferent speaking rates is proposed, and its variations correlate with results from perceptual loudnes s t ts.

[1]  Tatsuya Kawahara,et al.  Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  J H Abbs,et al.  The influence of the gamma motor system on jaw movements during speech: a theoretical framework and some preliminary observations. , 1973, Journal of speech and hearing research.

[3]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Werner Verhelst,et al.  The duration of speech pauses in a multilingual environment , 2007, INTERSPEECH.

[5]  Kevin G. Munhall,et al.  Gestural aggregation in speech: Laryngeal gestures , 1992 .

[6]  D P Kuehn,et al.  A cineradiographic investigation of velar movement variables in two normals. , 1976, The Cleft palate journal.

[7]  Bayya Yegnanarayana,et al.  Characterization of Glottal Activity From Speech Signals , 2009, IEEE Signal Processing Letters.

[8]  T. Gay,et al.  Effect of Speaking Rate on Labial Consonant Production , 1973, Phonetica.

[9]  I. Fónagy,et al.  Speed of Utterance in Phrases of Different Lengths , 1960 .

[10]  B. Lindblom Spectrographic Study of Vowel Reduction , 1963 .

[11]  Eugene Coyle,et al.  Speech-adaptive time-scale modification for computer assisted language-learning , 2003, Proceedings 3rd IEEE International Conference on Advanced Technologies.

[12]  Thilo Pfau,et al.  A combination of speaker normalization and speech rate normalization for automatic speech recognition , 2000, INTERSPEECH.

[13]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[14]  Eric Moulines,et al.  Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[15]  Eric Fosler-Lussier,et al.  Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  Bayya Yegnanarayana,et al.  Duration modification using glottal closure instants and vowel onset points , 2009, Speech Commun..

[17]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[18]  Matthew A. Siegler,et al.  Measuring and Compensating for the Effects of Speech Rate in Large Vocabulary Continuous Speech Recognition , 1995 .

[19]  O. Engstrand,et al.  Articulatory correlates of stress and speaking rate in Swedish VCV utterances. , 1988, The Journal of the Acoustical Society of America.

[20]  Thomas F. Quatieri,et al.  Shape invariant time-scale and pitch modification of speech , 1992, IEEE Trans. Signal Process..

[21]  Bayya Yegnanarayana,et al.  Prosody modification using instants of significant excitation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  L. Streeter,et al.  Effects of Pitch and Speech Rate on Personal Attributions , 1979 .

[23]  C. Fougeron,et al.  Rate effects on French intonation: prosodic organization and phonetic realization , 1998 .

[24]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[25]  Jürgen Trouvain,et al.  The Effect of Tempo on Prosodic Structure , 1999 .

[26]  Tatsuya Kawahara,et al.  Language model and speaking rate adaptation for spontaneous presentation speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[27]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[28]  Sheila E. Blumstein,et al.  Effects of speaking rate on voice-onset time and vowel production: Some implications for perception studies , 1998 .

[29]  Brigitte Zellner Fast and slow speech rate: a characterisation for French , 1998, ICSLP.

[30]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  J. Weijer,et al.  Language input to a prelingual infant , 1997 .

[32]  D. Ostry,et al.  Control of rate and duration of speech movements. , 1985, The Journal of the Acoustical Society of America.

[33]  B. Yegnanarayana,et al.  Perceived loudness of speech based on the characteristics of glottal excitation source. , 2009, The Journal of the Acoustical Society of America.

[34]  Alan V. Oppenheim,et al.  Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[35]  K. Scherer Acoustic Concomitants of Emotional Dimensions: Judging Affect from Synthesized Tone Sequences. , 1972 .

[36]  Raymond D. Kent,et al.  Cinefluorographic analyses of selected lingual consonants. , 1972, Journal of speech and hearing research.

[37]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[38]  T. Gay Mechanisms in the Control of Speech Rate , 1981, Phonetica.

[39]  Q. Summerfield Articulatory rate and perceptual constancy in phonetic perception. , 1981, Journal of experimental psychology. Human perception and performance.

[40]  K. S. R. Murty,et al.  Analysis of Stop Consonants in Indian Languages Using Excitation Source Information in Speech Signal , 2008 .

[41]  J L Miller,et al.  Some effects of speaking rate on the production of /b/ and /w/. , 1983, The Journal of the Acoustical Society of America.

[42]  T. Gay Effect of speaking rate on vowel formant movements. , 1978, The Journal of the Acoustical Society of America.