Pitch Correlogram Clustering for Fast Speaker Identification

Gaussian mixture models (GMMs) are commonly used in text-independent speaker identification systems. However, for large speaker databases, their high computational run-time limits their use in online or real-time speaker identification situations. Two-stage identification systems, in which the database is partitioned into clusters based on some proximity criteria and only a single-cluster GMM is run in every test, have been suggested in literature to speed up the identification process. However, most clustering algorithms used have shown limited success, apparently because the clustering and GMM feature spaces used are derived from similar speech characteristics. This paper presents a new clustering approach based on the concept of a pitch correlogram that captures frame-to-frame pitch variations of a speaker rather than short-time spectral characteristics like cepstral coefficient, spectral slopes, and so forth. The effectiveness of this two-stage identification process is demonstrated on the IVIE corpus of 110 speakers. The overall system achieves a run-time advantage of 500% as well as a 10% reduction of error in overall speaker identification.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[3]  Robert I. Damper,et al.  Improving speaker identification in noise by subband processing and decision fusion , 2003, Pattern Recognit. Lett..

[4]  Jean Rouat,et al.  Combining pitch and MFCC for speaker identification systems , 2001, Odyssey.

[5]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .

[6]  Larry P. Heck,et al.  Robust text-independent speaker identification over telephone channels , 1999, IEEE Trans. Speech Audio Process..

[7]  Sadaoki Furui,et al.  Robust methods of updating model and a priori threshold in speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Elizabeth Shriberg,et al.  Using prosodic and lexical information for speaker identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[10]  Larry P. Heck,et al.  A lognormal tied mixture model of pitch for prosody based speaker recognition , 1997, EUROSPEECH.

[11]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[12]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[13]  Til T. Phan,et al.  Text-Independent Speaker Identification , 1999 .

[14]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  Ziqiang Wang,et al.  Covariance-tied clustering method in speaker identification , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[17]  Tomi Kinnunen COMPARISON OF CLUSTERING ALGORITHMS IN SPEAKER IDENTIFICATION , 2000 .

[18]  Stéphane H. Maes,et al.  Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition , 2003, IEEE Trans. Speech Audio Process..

[19]  Roland Kuhn,et al.  Eigenvoices for speaker adaptation , 1998, ICSLP.

[20]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[21]  A.E. Rosenberg,et al.  Automatic speaker verification: A review , 1976, Proceedings of the IEEE.

[22]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[23]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[24]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[25]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[26]  Kuldip K. Paliwal,et al.  USE OF VOICING AND PITCH INFORMATION FOR SPEAKER RECOGNITION , 2000 .