Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel Information

In real-life conditions, mismatch between development and test domain degrades speaker recognition performance. To solve the issue, many researchers explored domain adaptation approaches using matched in-domain dataset. However, adaptation would be not effective if the dataset is insufficient to estimate channel variability of the domain. In this paper, we explore the problem of performance degradation under such a situation of insufficient channel information. In order to exploit limited in-domain dataset effectively, we propose an unsupervised domain adaptation approach using Autoencoder based Domain Adaptation (AEDA). The proposed approach combines an autoencoder with a denoising autoencoder to adapt resource-rich development dataset to test domain. The proposed technique is evaluated on the Domain Adaptation Challenge 13 experimental protocols that is widely used in speaker recognition for domain mismatched condition. The results show significant improvements over baselines and results from other prior studies.

[1]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[2]  Douglas A. Reynolds,et al.  Domain Mismatch Compensation for Speaker Recognition Using a Library of Whiteners , 2015, IEEE Signal Processing Letters.

[3]  Douglas A. Reynolds,et al.  Data Selection for Within-Class Covariance Estimation , 2016, INTERSPEECH.

[4]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[5]  Sridha Sridharan,et al.  Dataset-invariant covariance normalization for out-domain PLDA speaker verification , 2015, INTERSPEECH.

[6]  Spyridon Matsoukas,et al.  Domain adaptation via within-class covariance correction in I-vector based speaker recognition systems , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Hagai Aronowitz,et al.  Inter dataset variability compensation for speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[9]  Dong Liu,et al.  Robust visual domain adaptation with low-rank reconstruction , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Lukás Burget,et al.  Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Yasuo Horiuchi,et al.  Reverberant speech recognition based on denoising autoencoder , 2013, INTERSPEECH.

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[14]  Eduardo Lleida,et al.  Unsupervised adaptation of PLDA by using variational Bayes methods , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Niko Brümmer,et al.  Unsupervised Domain Adaptation for I-Vector Speaker Recognition , 2014, Odyssey.

[16]  Alan McCree,et al.  Supervised domain adaptation for I-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Sergey Novoselov,et al.  A Speaker Recognition System for the SITW Challenge , 2016, INTERSPEECH.

[19]  Hanseok Ko,et al.  Maximum likelihood Linear Dimension Reduction of heteroscedastic feature for robust Speaker Recognition , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[20]  Hagai Aronowitz Compensating Inter-Dataset Variability in PLDA Hyper-Parameters for Robust Speaker Recognition , 2014, Odyssey.

[21]  Hanseok Ko,et al.  KU-ISPL Speaker Recognition Systems under Language mismatch condition for NIST 2016 Speaker Recognition Evaluation , 2017, ArXiv.

[22]  Douglas A. Reynolds,et al.  Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems , 2014, Odyssey.

[23]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[25]  Sridha Sridharan,et al.  Improving out-domain PLDA speaker verification using unsupervised inter-dataset variability compensation approach , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).