Speaker Identification using Row Mean of Haar and Kekre’s Transform on Spectrograms of Different Frame Sizes

In this paper, we propose Speaker Identification using two transforms, namely Haar Transform and Kekre’s Transform. The speech signal spoken by a particular speaker is converted into a spectrogram by using 25% and 50% overlap between consecutive sample vectors. The two transforms are applied on the spectrogram. The row mean of the transformed matrix forms the feature vector, which is used in the training as well as matching phases. The results of both the transform techniques have been compared. Haar transform gives fairly good results with a maximum accuracy of 69% for both 25% as well as 50% overlap. Kekre’s Transform shows much better performance, with a maximum accuracy of 85.7% for 25% overlap and 88.5% accuracy for 50% overlap.

[1]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[2]  Sudeep D. Thepade,et al.  Kekre Transform over Row Mean, Column Mean and Both Using Image Tiling for Image Retrieval , 2010 .

[3]  M. Sambur Speaker Recognition and Verification using Linear Prediction Analysis , 1973 .

[4]  H. B. Kekre,et al.  Comparative Analysis of Automatic Speaker Recognition using Kekre’s Fast Codebook Generation Algorithm in Time and Transform Domain , 2010 .

[5]  Martin Loomes,et al.  Sub-band based text-dependent speaker verification , 2003, Speech Commun..

[6]  G. Doddington A Method or Speaker Verification , 1971 .

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Tamer Ölmez,et al.  Classification of Respiratory Sounds by Using an Artificial Neural Network , 2003, Int. J. Pattern Recognit. Artif. Intell..

[9]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[10]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[11]  J. E. Dammann,et al.  Experimental Studies in Speaker Verification, Using an Adaptive System , 1966 .

[12]  M. V. Mathews,et al.  Statistical techniques for talker identification , 1971 .

[13]  H. B. Kekre,et al.  Performance Comparison of Speaker Recognition using Vector Quantization by LBG and KFCG , 2010 .

[14]  Robert I. Damper,et al.  Improving speaker identification in noise by subband processing and decision fusion , 2003, Pattern Recognit. Lett..

[15]  Tomi Kinnunen,et al.  Real-time speaker identification , 2004, INTERSPEECH.

[16]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Jean-François Bonastre,et al.  Localization and selection of speaker-specific information with statistical modeling , 2000, Speech Commun..

[18]  Boling Xu,et al.  Binary quantization of feature vectors for robust text-independent speaker identification , 1999, IEEE Trans. Speech Audio Process..

[19]  H. B. Kekre,et al.  Performance Comparison of Speaker Identification Using DCT, Walsh, Haar on Full and Row Mean of Spectrogram , 2010 .

[20]  Sadaoki Furui,et al.  50 Years of Progress in Speech and Speaker Recognition Research , 1970 .

[21]  Sudeep D. Thepade,et al.  Edge Texture Based CBIR using Row Mean of Transformed Column Gradient Image , 2010 .

[22]  Tridibesh Dutta,et al.  TEXT DEPENDENT SPEAKER IDENTIFICATION BASED ON SPECTROGRAMS , 2007 .

[23]  Dr. H. B. Kekre,et al.  Speaker Identification by using Vector Quantization , 2010 .

[24]  Jean-François Bonastre,et al.  Subband architecture for automatic speaker recognition , 2000, Signal Process..

[25]  Vaishali Kulkarni,et al.  Speaker Identification using Row Mean of DCT and Walsh Hadamard Transform , 2011 .

[26]  T. Olmez,et al.  Classification of respiratory sounds by using an artificial neural network , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27]  A. Sleit,et al.  A histogram based speaker identification technique , 2008, 2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT).

[28]  S. Pruzansky Pattern‐Matching Procedure for Automatic Talker Recognition , 1963 .

[29]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Simon Haykin,et al.  Advances in spectrum analysis and array processing , 1991 .

[31]  Sudeep D. Thepade,et al.  Performance Comparision of Image Retrieval using Row Mean of Transformed Column Image , 2010 .

[32]  Parul,et al.  Automatic Speaker Recognition System , 2013 .

[33]  Dr. H. B. Kekre Performance Comparison of Automatic Speaker Recognition using Vector Quantization by LBG KFCG and KMCG , 2011 .

[34]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[35]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[36]  Minh N. Do,et al.  Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance , 2002, IEEE Trans. Image Process..

[37]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[38]  Dr. H. B. Kekre,et al.  Speaker Identification using Power Distribution in Frequency Spectrum , 2010 .

[39]  Sudeep D. Thepade,et al.  Eigenvectors of Covariance Matrix using Row Mean and Column Mean Sequences for Face Recognition , 2010 .

[40]  L. P. Ricotti Multitapering and a wavelet variant of MFCC in speech recognition , 2005 .