On the fusion of dissimilarity-based classifiers for speaker identification

In this work, we describe a speaker identification system that uses multiple supplementary information sources for computing a combined match score for the unknown speaker. Each speaker profile in the database consists of multiple feature vector sets that can vary in their scale, dimensionality, and the number of vectors. The evidence from a given feature set is weighted by its reliability that is set in a priori fashion. The confidence of the identification result is also estimated. The system is evaluated with a corpus of 110 Finnish speakers. The evaluated feature sets include mel-cepstrum, LPC-cepstrum, dynamic cepstrum, long-term averaged spectrum of /A/ vowel, and F0.

[1]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Tomi Kinnunen,et al.  Designing a speaker-discriminative adaptive filter bank for speaker recognition , 2002, INTERSPEECH.

[4]  Seiichi Nakagawa,et al.  Text-independent speaker recognition using multiple information sources , 1998, ICSLP.

[5]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Pasi Fränti,et al.  Randomised Local Search Algorithm for the Clustering Problem , 2000, Pattern Analysis & Applications.

[7]  Elizabeth Shriberg,et al.  Using prosodic and lexical information for speaker identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[9]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Sridha Sridharan,et al.  A comparison of fusion techniques in mel-cepstral based speaker identification , 1998, ICSLP.

[11]  Robert P. W. Duin,et al.  The combining classifier: to train or not to train? , 2002, Object recognition supported by user interaction for service robots.

[12]  Tu Bao Ho,et al.  Temporal decomposition: a promising approach to VQ-based speaker identification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).