Long term examination of intra-session and inter-session speaker variability

Session variability in speaker recognition is a well recognized phenomena, but poorly understood largely due to a dearth of robust longitudinal data. The current study uses a large, longterm speaker database to quantify both speaker variability changes within a conversation and the impact of speaker variability changes over the long term (3 years). Results demonstrate that 1) change in accuracy over the course of a conversation is statistically very robust and 2) that the aging effect over three years is statistically negligible. Finally we demonstrate that voice change during the course of a conversation is, in large part, comparable across sessions. Index Terms: session variability, speaker recognition, speaker variability analysis, conversation analysis

[1]  T. Kato,et al.  Improved speaker, verification over the cellular phone network using phoneme-balanced and digit-sequence-preserving connected digit patterns , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Matthieu Hébert,et al.  Text-Dependent Speaker Recognition , 2008 .

[3]  Douglas A. Reynolds,et al.  Comparison of background normalization methods for text-independent speaker verification , 1997, EUROSPEECH.

[4]  Aaron D. Lawson,et al.  Perturbation and pitch normalization as enhancements to speaker recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Jay L. Devore,et al.  Probability and statistics for engineering and the sciences , 1982 .

[6]  J. W. Gorman,et al.  Fitting Equations to Data. , 1973 .

[7]  Aaron Lawson,et al.  External factors impacting the performance of speaker identification: Multisession audio research project (MARP) corpus experiments , 2007 .

[8]  Jo Ann Goldberg,et al.  The amplitude shift mechanism in conversational closing sequences , 2004 .

[9]  Cuthbert Daniel,et al.  Fitting Equations to Data: Computer Analysis of Multifactor Data , 1980 .

[10]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[11]  Andreas Stolcke,et al.  Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  E. Schegloff Sequence Organization in Interaction: Contents , 2007 .