Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems

In this paper, we motivate and define the domain adaptation challenge task for speaker recognition. Using an i-vector system trained only on out-of-domain data as a starting point, we propose a framework that utilizes large-scale clustering algorithms and unlabeled in-domain data to adapt the system for evaluation. In presenting the results and analyses of an empirical exploration of this problem, our initial findings suggest that, while perfect clustering yields the best results, imperfect clustering can still provide recognition performance within 15% of the optimal. We further present a system that achieves recognition performance comparable to one that is provided all knowledge of the domain mismatch, and lastly, we outline throughout this paper some of the many directions for future work that this new task provides.

[1]  Alvin F. Martin,et al.  Human Assisted Speaker Recognition In NIST SRE10 , 2010, Odyssey.

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[4]  S. Dongen Graph clustering by flow simulation , 2000 .

[5]  Masashi Sugiyama,et al.  Covariate shift adaptation for semi-supervised speaker identification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  David A. van Leeuwen,et al.  Large-Scale Speaker Diarization for Long Recordings and Small Collections , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Carlos Vaquero,et al.  Dataset shift in PLDA based speaker verification , 2012, Odyssey.

[9]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[10]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[11]  Douglas A. Reynolds,et al.  The NIST 2014 Speaker Recognition i-vector Machine Learning Challenge , 2014, Odyssey.

[12]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[13]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Alan McCree,et al.  Supervised domain adaptation for I-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Douglas A. Reynolds,et al.  Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge , 2014, INTERSPEECH.

[16]  William M. Campbell,et al.  Large-scale community detection on speaker content graphs , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Bengt J. Borgstrom,et al.  Discriminatively trained Bayesian speaker comparison of i-vectors , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[20]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[21]  David A. van Leeuwen Speaker linking in large data sets , 2010, Odyssey.

[22]  Niko Brümmer,et al.  Unsupervised Domain Adaptation for I-Vector Speaker Recognition , 2014, Odyssey.

[23]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[24]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[25]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[26]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[27]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[28]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.