Speaking Style Authentication Using Suprasegmental Hidden Markov Models

The importance of speaking style authentication from human speech is gaining an increasing attention and concern from the engineering community. The importance comes from the demand to enhance both the naturalness and efficiency of spoken language human-machine interface. Our work in this research focuses on proposing, implementing, and testing speaker-dependent and text-dependent speaking style authentication (verification) systems that accept or reject the identity claim of a speaking style based on suprasegmental hidden Markov models (SPHMMs). Based on using SPHMMs, our results show that the average speaking style authentication performance is: 99%, 37%, 85%, 60%, 61%, 59%, 41%, 61%, and 57% belonging respectively to the speaking styles: neutral, shouted, slow, loud, soft, fast, angry, happy, and fearful.

[1]  John H. L. Hansen,et al.  The Impact of Speech Under `Stress''on Military Speech Technology , 2000 .

[2]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[3]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[4]  Ismail Shahin Improving Speaker Identification Performance Under the Shouted Talking Condition Using the Second-Order Hidden Markov Models , 2005, EURASIP J. Adv. Signal Process..

[5]  Jianing Dai Isolated word recognition using Markov chain models , 1995, IEEE Trans. Speech Audio Process..

[6]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[7]  Wayne A. Lea,et al.  Prosodic Aids to Speech Recognition , 1972 .

[8]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[9]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[10]  Shrikanth S. Narayanan,et al.  Toward detecting emotions in spoken dialogs , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  Thomas Fang Zheng,et al.  Study on speaker verification on emotional speech , 2006, INTERSPEECH.

[12]  K E Cummings,et al.  Analysis of the glottal excitation of emotionally styled and stressed speech. , 1995, The Journal of the Acoustical Society of America.

[13]  Alex Waibel,et al.  Detecting Emotions in Speech , 1998 .

[14]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[15]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[16]  R. Plutchik The psychology and biology of emotion , 1994 .

[17]  R. Schulman,et al.  Articulatory dynamics of loud and normal speech. , 1989, The Journal of the Acoustical Society of America.

[18]  John Grothendieck,et al.  Tracking changes in language , 2005, IEEE Transactions on Speech and Audio Processing.

[19]  I. Shahin Talking Condition Identification using Circular Hidden Markov Models , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[20]  Yang Li,et al.  Recognizing emotions in speech using short-term and long-term features , 1998, ICSLP.

[21]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[22]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[23]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[24]  Y.-C. Zheng,et al.  Text-dependent speaker identification using circular hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[25]  David Escudero Mancebo,et al.  Analysis of prosodic features towards modelling of emotional and pragmatic attributes of speech , 2005, Proces. del Leng. Natural.

[26]  Ismail Shahin,et al.  Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models , 2006, Speech Commun..