Speaker Recognition Using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring

Today, more and more people have benefited from the speaker recognition. However, the accuracy of speaker recognition often drops off rapidly because of the low-quality speech and noise. This paper proposed a new speaker recognition model based on wavelet packet entropy (WPE), i-vector, and cosine distance scoring (CDS). In the proposed model, WPE transforms the speeches into short-term spectrum feature vectors (short vectors) and resists the noise. I-vector is generated from those short vectors and characterizes speech to improve the recognition accuracy. CDS fast compares with the difference between two i-vectors to give out the recognition result. The proposed model is evaluated by TIMIT speech database. The results of the experiments show that the proposed model can obtain good performance in clear and noisy environment and be insensitive to the low-quality speech, but the time cost of the model is high. To reduce the time cost, the parallel computation is used.

[1]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[2]  Tarek A. Tutunji,et al.  Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers , 2015, Appl. Soft Comput..

[3]  Derya Avci,et al.  An expert system for speaker identification using adaptive wavelet sure entropy , 2009, Expert Syst. Appl..

[4]  Seyyed Ali Seyyedsalehi,et al.  Comparison between wavelet packet transform, Bark Wavelet & MFCC for robust speech recognition tasks , 2010, 2010 The 2nd International Conference on Industrial Mechatronics and Automation.

[5]  Hugo Van hamme,et al.  Accent recognition using i-vector, Gaussian Mean Supervector and Gaussian posterior probability supervector for spontaneous telephone speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Ping Tao,et al.  Mallat algorithm of wavelet for time-varying system parametric identification , 2013, 2013 25th Chinese Control and Decision Conference (CCDC).

[7]  Abbes Amira,et al.  Speaker identification using multimodal neural networks and wavelet analysis , 2015, IET Biom..

[8]  Hendrik T. Macedo,et al.  Multi-kernel approach to Parallelization of EM Algorithm for GMM Training , 2014, 2014 Brazilian Conference on Intelligent Systems.

[9]  Alan McCree,et al.  Supervised domain adaptation for I-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Jing Bai,et al.  The Speech Recognition System Based On Bark Wavelet MFCC , 2006, 2006 8th international Conference on Signal Processing.

[11]  Jagannath H. Nirmal,et al.  A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[12]  Khaled Daqrouq,et al.  Wavelet entropy and neural network for text-independent speaker identification , 2011, Eng. Appl. Artif. Intell..

[13]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[14]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[15]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[16]  Patrick Kenny,et al.  An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech , 2010, Odyssey.

[17]  Sridha Sridharan,et al.  I-vector based speaker recognition using advanced channel compensation techniques , 2014, Comput. Speech Lang..

[18]  Jianlin Wang,et al.  Multi-Level Wavelet Shannon Entropy-Based Method for Single-Sensor Fault Location , 2015, Entropy.

[19]  H. S. Jayanna,et al.  Efficient window for monolingual and crosslingual speaker identification using MFCC , 2013, 2013 International Conference on Advanced Computing and Communication Systems.

[20]  Shrikanth S. Narayanan,et al.  Speaker verification using simplified and supervised i-vector modeling , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Todor Ganchev,et al.  Wavelet basis selection for enhanced speech parametrization in speaker verification , 2014, Int. J. Speech Technol..

[22]  Himer Avila-George,et al.  Using acoustic paralinguistic information to assess the interaction quality in speech-based systems for elderly users , 2017, Int. J. Hum. Comput. Stud..

[23]  David A. van Leeuwen,et al.  Improved speaker recognition when using i-vectors from multiple speech sources , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  B. Šalna,et al.  Evaluation of Effectiveness of Different Methods in Speaker Recognition , 2010 .

[25]  Mahesh Chandra,et al.  Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition , 2014, Comput. Electr. Eng..

[26]  K. I. Ramachandran,et al.  Cosine distance features for robust speaker verification , 2015, INTERSPEECH.

[27]  Astik Biswas,et al.  Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition , 2014, Int. J. Speech Technol..