Alternative Approaches to Neural Network Based Speaker Verification

Just like in other areas of automatic speech processing, feature extraction based on bottleneck neural networks was recently found very effective for the speaker verification task. However, better results are usually reported with more complex neural network architectures (e.g. stacked bottlenecks), which are difficult to reproduce. In this work, we experiment with the so called deep features, which are based on a simple feed-forward neural network architecture. We study various forms of applying deep features to i-vector/PDA based speaker verification. With proper settings, better verification performance can be obtained by means of this simple architecture as compared to the more elaborate bottleneck features. Also, we further experiment with multi-task training, where the neural network is trained for both speaker recognition and senone recognition objectives. Results indicate that, with a careful weighting of the two objectives, multi-task training can result in significantly better performing deep features.

[1]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[2]  Martin Karafiát,et al.  Hierarchical neural net architectures for feature extraction in ASR , 2010, INTERSPEECH.

[3]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[4]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[5]  Ruhi Sarikaya,et al.  Bottleneck features for speaker recognition , 2012, Odyssey.

[6]  Yuan Liu,et al.  Tandem deep features for text-dependent speaker verification , 2014, INTERSPEECH.

[7]  Jan Cernocký,et al.  BUT 2014 Babel system: analysis of adaptation in NN based systems , 2014, INTERSPEECH.

[8]  Liang He,et al.  Investigation of bottleneck features and multilingual deep neural networks for speaker verification , 2015, INTERSPEECH.

[9]  Yun Lei,et al.  Advances in deep neural network approaches to speaker recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Douglas A. Reynolds,et al.  A unified deep neural network for speaker and language recognition , 2015, INTERSPEECH.

[11]  Lukás Burget,et al.  Analysis of DNN approaches to speaker identification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Aaron Lawson,et al.  Exploring the role of phonetic bottleneck features for speaker and language recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Lukás Burget,et al.  Analysis and Optimization of Bottleneck Features for Speaker Recognition , 2016, Odyssey.

[14]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[15]  The NIST Year 2010 Speaker Recognition Evaluation Plan 1 I NTRODUCTION , 2022 .