On‐Line Linear Combination of Classifiers Based on Incremental Information in Speaker Verification

A novel multiclassifier system (MCS) strategy is proposed and applied to a text-dependent speaker verification task. The presented scheme optimizes the linear combination of classifiers on an on-line basis. In contrast to ordinary MCS approaches, neither a priori distributions nor pre-tuned parameters are required. The idea is to improve the most accurate classifier by making use of the incremental information provided by the second classifier. The on-line multiclassifier optimization approach is applicable to any pattern recognition problem. The proposed method needs neither a priori distributions nor pre-estimated weights, and does not make use of any consideration about training/testing matching conditions. Results with Yoho database show that the presented approach can lead to reductions in equal error rate as high as 28%, when compared with the most accurate classifier, and 11% against a standard method for the optimization of linear combination of classifiers.

[1]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[2]  Néstor Becerra Yoma,et al.  Confidence based multiple classifier fusion in speaker verification , 2008, Pattern Recognit. Lett..

[3]  Ying Liu,et al.  The Role of Dynamic Features in Text-Dependent and -Independent Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Kevin R. Farrell Text-dependent speaker verification using data fusion , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Gérard Chollet,et al.  Combining methods to improve speaker verification decision , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Pushpa N. Rathie,et al.  On the entropy of continuous probability distributions (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[7]  Scott Axelrod,et al.  Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Yong Gu,et al.  A hybrid score measurement for HMM-based speaker verification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  Xin Dong,et al.  Speaker recognition using continuous density support vector machines , 2001 .

[10]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[12]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[13]  S. R. Mahadeva Prasanna,et al.  Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system , 2005, IEEE Transactions on Speech and Audio Processing.

[14]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[15]  Lin-Shan Lee,et al.  Entropy-Based Feature Parameter Weighting for Robust Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Seong-Whan Lee,et al.  Combining classifiers based on minimization of a Bayes error rate , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[17]  Pascal Druyts,et al.  Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions , 2000, Digit. Signal Process..

[18]  François Fouss,et al.  Yet Another Method for Combining Classifiers Outputs: A Maximum Entropy Approach , 2004, Multiple Classifier Systems.

[19]  Hervé Bourlard,et al.  Hybrid HMM/ANN and GMM combination for user-customized password speaker verification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[20]  Ken Chen,et al.  An evaluation of using mutual information for selection of acoustic-features representation of phonemes for speech recognition , 2002, INTERSPEECH.

[21]  R.J. Mammone,et al.  Sub-word speaker verification using data fusion methods , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[22]  Jean-Bernard Choquel,et al.  A new probabilistic and entropy fusion approach for management of information sources , 2004, Inf. Fusion.

[23]  Ludmila I. Kuncheva,et al.  Using measures of similarity and inclusion for multiple classifier fusion by decision templates , 2001, Fuzzy Sets Syst..

[24]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[25]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Ahmad Akbari,et al.  Improved HMM entropy for robust sub-band speech recognition , 2005, 2005 13th European Signal Processing Conference.

[27]  Zhaohui Wu,et al.  An UBM-Based Reference Space for Speaker Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[28]  H. Leung,et al.  Minimum entropy approach for multisensor data fusion , 1997, Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics.

[29]  Néstor Becerra Yoma,et al.  Unsupervised re-scoring of observation probability based on maximum entropy criterion by using confidence measure with telephone speech , 2008, INTERSPEECH.

[30]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  J.H.L. Hansen,et al.  An efficient scoring algorithm for Gaussian mixture model based speaker identification , 1998, IEEE Signal Processing Letters.

[32]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[33]  Sun-Yuan Kung,et al.  Robust speaker verification from GSM-transcoded speech based on decision fusion and feature transformation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[34]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[36]  Helen C. Shen,et al.  Dependence in sensory data combination , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[37]  Dirk Van Compernolle,et al.  Maximum mutual information training of distance measures for template based speech recognition , 2005 .