Robust remote speaker recognition system based on AR-MFCC features and efficient speech activity detection algorithm

A remote text-independent automatic speaker recognition system has been proposed for communication channel in VoIP applications. The proposed system employs a robust speech feature that uses an efficient speech activity detection algorithm and GMM model. Mel-Frequency Cepstral coefficient (MFCC) is a very useful feature for speech processing in clean conditions but it deteriorates in the presence of noise. Feature extraction framework based on the well known MFCC and autoregressive model (AR) features has been proposed. TIMIT database with speech from 630 speakers has been used in Matlab simulation. The first four utterances for each speaker could be defined as the training set while 1 utterance as the test set. The use of AR-MFCC approach has provided significant improvements in identification rate accuracy when compared with MFCC in noisy environment. However, in terms of runtime, AR-MFCC requires more time to execute than MFCC.

[1]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[2]  Harald Höge,et al.  Evaluation of Pitch Detection Algorithms in Adverse Conditions , 2006 .

[3]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[4]  Goutam Saha,et al.  Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition , 2012, Speech Commun..

[5]  S. Cherifa,et al.  New technique to use the GMM in speaker recognition system (SRS) , 2013, 2013 International Conference on Computer Applications Technology (ICCAT).

[6]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[7]  B.V. Harsha A noise robust speech activity detection algorithm , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[8]  Chuan Xie,et al.  Algorithm of Abnormal Audio Recognition Based on Improved MFCC , 2012 .

[9]  Abraham Alcaim,et al.  GMM Versus AR-Vector Models for Text Independent Speaker Verification , 2002 .

[10]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[11]  Salim Sbaa,et al.  Effects of speech codecs on a remote speaker recognition system using noval SAD , 2014 .

[12]  Moataz M. H. El Ayadi Autoregressive models for text independent speaker identification in noisy environments , 2008 .

[13]  Antonio M. Peinado Speech Recognition Over Digital Channels: Robustness and Standards , 2006 .

[14]  DeLiang Wang,et al.  Analyzing noise robustness of MFCC and GFCC features in speaker identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.