论文信息 - On the use of stress information in speech for speaker recognition

On the use of stress information in speech for speaker recognition

The performance of a speaker recognition system decreases when the speaker is under stress or emotion. In this paper we explore and identify a mechanism that enables use of inherent stress-in-speech or speaking style information present in speech of a person as additional cues for speaker recognition. We quantify the the inherent stress present in the speech of a speaker mainly using 3 features, namely, pitch, amplitude and duration (together called PAD) We experimentally observe that the PAD vectors of similar phones in different words of a speaker are close to each other in the three dimensional (PAD) space confirming that the way a speaker stresses different syllables in their speech is unique to them, thus we propose the use of PAD based speaking style of a speaker as an additional feature for speaker recognition applications.

Sunil Kumar Kopparapu | M. LaxmiNarayana

[1] Nengheng Zheng,et al. Integration of Complementary Acoustic Features for Speaker Recognition , 2007, IEEE Signal Processing Letters.

[2] Douglas A. Reynolds,et al. Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[4] Fangyu Hu,et al. A Hierarchical Approach to Automatic Stress Detection in English Sentences , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5] Milan Sigmund,et al. Spectral Analysis of Speech under Stress , 2007 .

[6] John H. L. Hansen,et al. Nonlinear analysis and classification of speech under stressed conditions , 1994 .

[7] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8] Pilar Prieto,et al. Acoustic cues of stress and accent in Catalan , 2006 .

[9] Mari Ostendorf,et al. Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[10] M. Sayadi,et al. Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[11] Stephanie Seneff,et al. Lexical stress modeling for improved speech recognition of spontaneous telephone speech in the jupiter domain , 2001, INTERSPEECH.

[12] Saifur Rahman,et al. SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[13] Douglas A. Reynolds,et al. Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..