论文信息 - Speaker Verification based on Deep Neural Network for Text-Constrained Short Commands

Speaker Verification based on Deep Neural Network for Text-Constrained Short Commands

Speaker verification has been known to be a tough task especially under the condition of short utterances. Based on the observation that actual voice commands are composed of a few repeated words, we propose an effective approach for building and training a deep neural network to extract features with properties appropriate for tackling such condition. We demonstrate the effectiveness through experiments independently designed for each property. Our proposed approach achieves 5.89% equal error rate on word scale commands shorter than 1 second, and with a linear discriminative analysis, it decreases to 3.43%.

Kiyoung Choi | Heesu Kim | Euntae Choi

[1] Patrick Kenny,et al. A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.

[6] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8] Thomas Fang Zheng,et al. Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Georg Heigold,et al. End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Yun Lei,et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Sridha Sridharan,et al. Making Confident Speaker Verification Decisions With Minimal Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14] Frank K. Soong,et al. DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances , 2017, INTERSPEECH.

[15] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Dong Wang,et al. Deep Speaker Feature Learning for Text-Independent Speaker Verification , 2017, INTERSPEECH.

[17] Nasser M. Nasrabadi,et al. Text-Independent Speaker Verification Using 3D Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[18] Sanjeev Khudanpur,et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.

[19] Shuai Wang,et al. What Does the Speaker Embedding Encode? , 2017, INTERSPEECH.

[20] Wei Li,et al. Centroid-aware local discriminative metric learning in speaker verification , 2017, Pattern Recognit..

[21] Ning Chen,et al. Feature sparsity analysis for i-vector based speaker verification , 2016, Speech Commun..

[22] Thomas Fang Zheng,et al. Deep Speaker Vectors for Semi Text-independent Speaker Verification , 2015, ArXiv.