Performance Evaluation of Statistical Approaches for Text Independent Speaker Recognition Using Source Feature

This paper introduces the performance evaluation of statistical approaches for TextIndependent speaker recognition system using source feature. Linear prediction LP residual is used as a representation of excitation information in speech. The speaker-specific information in the excitation of voiced speech is captured using statistical approaches such as Gaussian Mixture Models GMMs and Hidden Markov Models HMMs. The decrease in the error during training and recognizing speakers during testing phase close to 100 percent accuracy demonstrates that the excitation component of speech contains speaker-specific information and is indeed being effectively captured by continuous Ergodic HMM than GMM. The performance of the speaker recognition system is evaluated on GMM and 2 state ergodic HMM with different mixture components and test speech duration. We demonstrate the speaker recognition studies on TIMIT database for both GMM and Ergodic HMM.

[1]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[2]  Mervyn A. Jack,et al.  Discriminating semi-continuous HMM for speaker verification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[5]  Mark E. Forsyth Discriminating observation probability (DOP) HMM for speaker verification , 1995, Speech Commun..

[6]  G. Doddington,et al.  High performance speaker verification using principal spectral components , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Kenneth N. Stevens,et al.  Automatic Speaker Verification: A Review , 1976 .

[8]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[9]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[10]  B. Yegnanarayana,et al.  Speaker-specific information from residual phase , 2004, 2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04..

[11]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[12]  Herbert Gish,et al.  Methods and experiments for text-independent speaker recognition over telephone channels , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  M. Sambur Speaker recognition using orthogonal linear prediction , 1975 .

[14]  V. Kamakshi Prasad Automatic Text Independent Speaker Recognition Using Source Feature , 2012 .

[15]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[16]  Robert J. Logan,et al.  Human and machine performance on speaker identity verification , 1988 .

[17]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[20]  Aaron E. Rosenberg,et al.  New techniques for automatic speaker verification , 1975 .

[21]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .