Speaker Identification using MFCC-Domain Support Vector Machine

Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into text-independent and text-dependent. This paper presents a technique of text-dependent speaker identification using MFCC-domain support vector machine (SVM). In this work, melfrequency cepstrum coefficients (MFCCs) and their statistical distribution properties are used as features, which will be inputs to the neural network. This work firstly used sequential minimum optimization (SMO) learning technique for SVM that improve performance over traditional techniques Chunking, Osuna. The cepstrum coefficients representing the speaker characteristics of a speech segment are computed by nonlinear filter bank analysis and discrete cosine transform. The speaker identification ability and convergence speed of the SVMs are investigated for different combinations of features. Extensive experimental results on several samples show the effectiveness of the proposed approach.

[1]  Ulrich Bodenhausen,et al.  Connectionist architectural learning for high performance character and speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Joseph Picone,et al.  Support vector machines for speech recognition , 1998, ICSLP.

[4]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[5]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[6]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[8]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[9]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[10]  Boling Xu,et al.  Binary quantization of feature vectors for robust text-independent speaker identification , 1999, IEEE Trans. Speech Audio Process..

[11]  J. Movellan Tutorial on Hidden Markov Models , 2006 .

[12]  Claude Sammut,et al.  Automatic Speaker Recognition: An Application of Machine Learning , 1995, ICML.

[13]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.