Voice biometrics: Deep learning-based voiceprint authentication system

Speaker identification systems are becoming more important in today's world. This is especially true as devices rely on the user to speak commands. In this article, an analysis of how a text-independent voice identification system can be built is presented. Extracting the Mel-Frequency Cepstral Coefficients is evaluated and a support vector machine is trained and tested on two different data sets, one from LibriSpeech and one from in-house recorded audio files. The results show the ability for such systems to be utilized in both speaker identification and speaker verification tasks.

[1]  R. A. Khan,et al.  Applications of Speaker Recognition , 2012 .

[2]  Jagannath H. Nirmal,et al.  A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[3]  Paul Strauss,et al.  Clinical Measurement Of Speech And Voice , 2016 .

[4]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[5]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[6]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[7]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[8]  Oliver Durr,et al.  Speaker identification and clustering using convolutional neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[9]  Anthony T. Chronopoulos,et al.  Benchmarking Bare Metal Cloud Servers for HPC Applications , 2015, 2015 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM).

[10]  Saifur Rahman,et al.  SPEAKER IDENTIFICATION USING MEL FREQUENCY CEPSTRAL COEFFICIENTS , 2004 .

[11]  Felix Wortmann,et al.  Internet of Things , 2015, Business & Information Systems Engineering.

[12]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[13]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[14]  Paul Rad,et al.  A Next-Generation Secure Cloud-Based Deep Learning License Plate Recognition for Smart Cities , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[15]  Paul Rad,et al.  Cloud of Things in Smart Agriculture: Intelligent Irrigation Monitoring by Thermal Imaging , 2017, IEEE Cloud Computing.

[16]  Douglas A. Reynolds,et al.  A unified deep neural network for speaker and language recognition , 2015, INTERSPEECH.

[17]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[18]  M. Faundez-Zanuy,et al.  State-of-the-art in speaker recognition , 2005, IEEE Aerospace and Electronic Systems Magazine.

[19]  Homayoon Beigi,et al.  Speaker Recognition: Advancements and Challenges , 2012 .

[20]  Paul Rad,et al.  Deep learning control for complex and large scale cloud systems , 2017, Intell. Autom. Soft Comput..

[21]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Paul Rad,et al.  Low-latency software defined network for high performance clouds , 2015, 2015 10th System of Systems Engineering Conference (SoSE).

[26]  Lukás Burget,et al.  Analysis and Optimization of Bottleneck Features for Speaker Recognition , 2016, Odyssey.

[27]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).