STUDY OF SPEAKER RECOGNITION SYSTEMS

Speaker Recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their voices. This technique is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. It can be used for authentication, surveillance, forensic speaker recognition and a number of related activities. Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. The process of Speaker recognition consists of 2 modules namely: - feature extraction and feature matching. Feature extraction is the process in which we extract a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves identification of the unknown speaker by comparing the extracted features from his/her voice input with the ones from a set of known speakers. Our proposed work consists of truncating a recorded voice signal, framing it, passing it through a window function, calculating the Short Term FFT, extracting its features and matching it with a stored template. Cepstral Coefficient Calculation and Mel frequency Cepstral Coefficients (MFCC) are applied for feature extraction purpose. VQLBG (Vector Quantization via Linde-Buzo-Gray), DTW (Dynamic Time Warping) and GMM (Gaussian Mixture Modelling) algorithms are used for generating template and feature matching purpose.

[1]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[2]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[3]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[7]  李幼升,et al.  Ph , 1989 .

[8]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[9]  Richard M. Schwartz,et al.  The application of probability density estimation to text-independent speaker identification , 1982, ICASSP.

[10]  Zhao Rongchun,et al.  An overview of modeling technology of speaker recognition , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[11]  Richard J. Mammone,et al.  Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions , 1998, IEEE Trans. Speech Audio Process..

[12]  M. Sayadi,et al.  Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[13]  Sridha Sridharan,et al.  Telephone based speaker recognition using multiple binary classifier and Gaussian mixture models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  D.P. Skinner,et al.  The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.

[15]  Pradeep Ch Text dependent speaker recognition using MFCC and LBG VQ , 2007 .

[16]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.