Speaker Verification Channel Compensation Based on DAE-RBM-PLDA

In the speaker recognition system, a model combining the Deep Neural Network (DNN), Identity Vector (I-Vector) and Probabilistic Linear Discriminant Analysis (PLDA) proved to be very effective. In order to further improve the performance of PLDA recognition model, the Denoising Autoencoder (DAE) and Restricted Boltzmann Machine (RBM) and the combination of them (DAE-RBM) are applied to the channel compensation on PLDA model, the aim is to minimize the effect of the speaker i-vector space channel information. The results of our experiment indicate that the Equal Error Rate (EER) and the minimum Detection Cost Function (minDCF) of DAE-PLDA and RBM-PLDA are significantly reduced compared with the standard PLDA system. The DAE-RBM-PLDA which combined the advantages of them enables system identification performance to be further improved.

[1]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[2]  Erik McDermott,et al.  Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[4]  Themos Stafylakis,et al.  PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification , 2012, INTERSPEECH.

[5]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Sergey Novoselov,et al.  Non-linear PLDA for i-vector speaker verification , 2015, INTERSPEECH.

[9]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.