Text and language independent speaker identification by using short-time low quality signals

Several speaker identification applications that exploit voice signals recorded by using wireless networks of small, low-power acoustic sensors are becoming feasible. However, the acoustic signals provided by these devices have typically lower signal-to-noise ratio compared to wired microphone systems. In this paper, we present a text and language independent speaker identification algorithm based on a cepstral speech parameterization method. We analyze the robustness of the algorithm when the quality of the recorded voice signals is decreased. We also investigate how the number of cepstral coefficients considered in the extracted feature vector, and the resolution of the Discrete Fourier Transform affect the algorithm performance. To make the application as close to real-time as possible, we propose a light-weight classification technique based on a simple –yet effective– similarity measure.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[3]  M. Sayadi,et al.  Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[4]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[5]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[6]  Anssi Klapuri,et al.  Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  A. Oppenheim,et al.  Homomorphic analysis of speech , 1968 .

[8]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[9]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[10]  Antti Eronen,et al.  Comparison of features for musical instrument recognition , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[11]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[12]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Jaakko Astola,et al.  The Mel-Frequency Cepstral Coefficients in the Context of Singer Identification , 2005, ISMIR.

[14]  Sadaoki Furui,et al.  Comparison of speaker recognition methods using statistical features and dynamic features , 1981 .

[15]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[16]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[17]  Stephen A. Zahorian,et al.  Text-independent talker identification with neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..