Frame Level Based Algorithm

In this paper, we propose an algorithm to improve the performance of speaker identification systems. A baseline speaker identification system uses a scoring of a test utterance against all speakers' models; this could be termed as an evaluation at the observation level. In the proposed approach, and prior to the standard evaluation phase, an algorithm based on a frame level evaluation is applied. The speaker identification study is conducted using IVIE corpus and a randomly selected 120 speakers from TIMIT. Mel-frequency cepstral coefficients (MFCC) and Gaussian mixture model (GMM) are the main components in state of the art speaker identification systems and will be adopted in this work. Experimental results based on several systems with different training and testing conditions, showed that our proposed algorithm yielded to relative reduction in error rates of 24.4 and 37.3% over the baseline systems respectively for IVIE and TIMIT. The final performances reached measured by identification error rates are 3.4% and 5.2% for IVIE and TIMIT corp uses.

[1]  Frédéric Bimbot,et al.  Steps toward the integration of speaker recognition in real-world telecom applications , 1998, ICSLP.

[2]  Ho-Sub Yoon,et al.  Automated Speaker Recognition for Home Service Robots Using Genetic Algorithm and Dempster–Shafer Fusion Technique , 2009, IEEE Transactions on Instrumentation and Measurement.

[3]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[4]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Stephen Faul,et al.  Automatic detection of EEG artefacts arising from head movements using EEG and gyroscope signals. , 2013, Medical engineering & physics.

[6]  Nizar Bouguila,et al.  A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection , 2012, Expert Syst. Appl..

[7]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[9]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[10]  Wolfgang Minker,et al.  Self-learning speaker identification for enhanced speech recognition , 2012, Comput. Speech Lang..

[11]  Beat Pfister,et al.  Estimating the weight of evidence in forensic speaker verification , 2003, INTERSPEECH.

[12]  Eliathamby Ambikairajah,et al.  FM features for automatic forensic speaker recognition , 2008, INTERSPEECH.

[13]  Antonio Nucci,et al.  Fuzzy-Clustering-Based Decision Tree Approach for Large Population Speaker Identification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Abdessamad Kobi,et al.  Fault diagnosis of industrial systems by conditional Gaussian network including a distance rejection criterion , 2010, Eng. Appl. Artif. Intell..

[15]  Heikki Lyytinen,et al.  An offline/real-time artifact rejection strategy to improve the classification of multi-channel evoked potentials , 2008, Pattern Recognit..

[16]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[17]  Ronald W. Schafer,et al.  Theory and Applications of Digital Speech Processing , 2010 .

[18]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[19]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[20]  Bhiksha Raj,et al.  Privacy-Preserving Speaker Verification and Identification Using Gaussian Mixture Models , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.