Speaker characterization using long-term and temporal information

This paper presents new techniques for front-end analysis using long-term and temporal information for speaker recognition. We propose a long-term feature analysis strategy that averages short-time spectral features over a period of time in an effort to capture the speaker traits that are manifested over a speech segment longer than a spectral frame. We found that the moving averages of temporal information are effective in speaker recognition as well. The experiments on the 2008 NIST Speaker Recognition Evaluation dataset show the longterm and temporal information contribute to substantial EER reductions.

[1]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Douglas A. Jones,et al.  Beyond Cepstra : Exploiting High-Level Information in Speaker Recognition , 2003 .

[5]  José L. Pérez-Córdoba,et al.  Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[6]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[7]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[8]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[10]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[11]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[12]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[13]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[14]  Jeff A. Bilmes,et al.  MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.