Prosody Analysis for Speaker Affect Determination

Introduction Speech is a complex waveform containing verbal (e.g. phoneme, syllable, and word) and nonverbal (e.g. speaker identity, emotional state, and tone) information. Both the verbal and nonverbal aspects of speech are extremely important in interpersonal communication and human-machine interaction. However, work in machine perception of speech has focused primarily on the verbal, or content-oriented, goals of speech recognition, speech compression, and speech labeling. Usage of nonverbal information has been limited to speaker identification applications. While the success of research in these areas is well documented, this success is fundamentally limited by the effect of nonverbal information on the speech waveform. The extralinguistic aspect of speech is considered a source of variability that theoretically can be minimized with an appropriate preprocessing technique; determination of such robust techniques is however, far from trivial.

[1]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[2]  Richard F. Lyon,et al.  On the importance of time—a temporal representation of sound , 1993 .

[3]  Dik J. Hermes,et al.  Pitch analysis , 1993 .

[4]  K E Cummings,et al.  Analysis of the glottal excitation of emotionally styled and stressed speech. , 1995, The Journal of the Acoustical Society of America.

[5]  Alex Pentland,et al.  Automatic spoken affect classification and analysis , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[6]  P. Lieberman,et al.  Fundamental frequency of phonation and perceived emotional stress. , 1997, The Journal of the Acoustical Society of America.

[7]  D. R. Ladd,et al.  Manipulating synthetic intonation for speaker characterisation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Gérard Bailly,et al.  Talking Machines: Theories, Models, and Designs , 1992 .

[9]  Akira Ichikawa,et al.  Some prosodical characteristics in spontaneous spoken dialogue , 1994, ICSLP.

[10]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[11]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[12]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.