Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies

In this article, I-vector Speaker Identification (SID) is exploited as a compact, low dimension, fixed length and modern state of the art system. The main structures for this study consist of four combinations of features which depend on Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features, with two different compensation approaches which have been previously proposed. The main system is modelled by I-vectors with low dimensions, and we also propose fusion strategies with different higher I-vector dimensions to improve the recognition rate. In addition, cumulative, concatenated, and interleaved fusion techniques are investigated to improve the conventional late fusion presented in our previous work. Moreover, the proposed system employs an Extreme Learning Machine (ELM) for classification purpose, which is efficient, less complex and less time consuming compared with traditional neural network based approaches. The system is evaluated on the TIMIT database for clean and AWGN environments and achieved a recognition rate of 96.67% and 80.83% respectively. The system shows improvements compared with the Gaussian Mixture Model-Universal Background Model (GMM-UBM) in our previously proposed scheme, with an improvement of 1.76% in clean speech and 2.1% for 30dB AWGN and with the highest improvement at 10dB with 43.81%.

[1]  Javier Hernando,et al.  i-Vector Modeling with Deep Belief Networks for Multi-Session Speaker Recognition , 2014, Odyssey.

[2]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[3]  Hongming Zhou,et al.  Optimization method based extreme learning machine for classification , 2010, Neurocomputing.

[4]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Sridha Sridharan,et al.  Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques , 2014, Speech Commun..

[6]  Patrick Kenny A small footprint i-vector extractor , 2012, Odyssey.

[7]  Driss Matrouf,et al.  A straightforward and efficient implementation of the factor analysis model for speaker verification , 2007, INTERSPEECH.

[8]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Pradip K. Das,et al.  i-Vectors in speech processing applications: a survey , 2015, Int. J. Speech Technol..

[10]  Yuan Lan,et al.  An extreme learning machine approach for speaker recognition , 2012, Neural Computing and Applications.

[11]  Patrick Kenny,et al.  An i-vector Extractor Suitable for Speaker Recognition with both Microphone and Telephone Speech , 2010, Odyssey.

[12]  Kai Kang,et al.  I-vector based text-independent speaker identification , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[13]  Lukás Burget,et al.  Analysis of DNN approaches to speaker identification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Heinz Hertlein,et al.  Effectiveness in open-set speaker identification , 2014, 2014 International Carnahan Conference on Security Technology (ICCST).

[15]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[16]  Li Chen,et al.  Emotional speaker recognition based on i-vector through Atom Aligned Sparse Representation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Javier Hernando,et al.  Deep belief networks for i-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Matthew Sharifi,et al.  Large-scale speaker identification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Wai Lok Woo,et al.  Study of statistical robust closed set speaker identification with feature and score-based fusion , 2016, 2016 IEEE Statistical Signal Processing Workshop (SSP).

[20]  Jozef Juhar,et al.  Emotion recognition in i-vector space , 2016, 2016 26th International Conference Radioelektronika (RADIOELEKTRONIKA).

[21]  S. Selva Nidhyananthan,et al.  A Framework for Multilingual Text- Independent speaker identification System , 2014, J. Comput. Sci..

[22]  Victor C. M. Leung,et al.  Extreme Learning Machines [Trends & Controversies] , 2013, IEEE Intelligent Systems.