On the use of stress information in speech for speaker recognition

The performance of a speaker recognition system decreases when the speaker is under stress or emotion. In this paper we explore and identify a mechanism that enables use of inherent stress-in-speech or speaking style information present in speech of a person as additional cues for speaker recognition. We quantify the the inherent stress present in the speech of a speaker mainly using 3 features, namely, pitch, amplitude and duration (together called PAD) We experimentally observe that the PAD vectors of similar phones in different words of a speaker are close to each other in the three dimensional (PAD) space confirming that the way a speaker stresses different syllables in their speech is unique to them, thus we propose the use of PAD based speaking style of a speaker as an additional feature for speaker recognition applications.

[1]  Nengheng Zheng,et al.  Integration of Complementary Acoustic Features for Speaker Recognition , 2007, IEEE Signal Processing Letters.

[2]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[4]  Fangyu Hu,et al.  A Hierarchical Approach to Automatic Stress Detection in English Sentences , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Milan Sigmund,et al.  Spectral Analysis of Speech under Stress , 2007 .

[6]  John H. L. Hansen,et al.  Nonlinear analysis and classification of speech under stressed conditions , 1994 .

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Pilar Prieto,et al.  Acoustic cues of stress and accent in Catalan , 2006 .

[9]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[10]  M. Sayadi,et al.  Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[11]  Stephanie Seneff,et al.  Lexical stress modeling for improved speech recognition of spontaneous telephone speech in the jupiter domain , 2001, INTERSPEECH.

[12]  Saifur Rahman,et al.  SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[13]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..