Application of the mutual information minimization to speaker recognition/identification improvement

In this paper we propose the inversion of nonlinear distortions in order to improve the recognition rates of a speaker recognizer system. We study the effect of saturations on the test signals, trying to take into account real situations where the training material has been recorded in a controlled situation, but the testing signals present some mismatch with the input signal level (saturations). The experimental results for speaker recognition shows that a combination of several strategies can improve the recognition rates with saturated test sentences from 80% to 89.39%, while the results with clean speech (without saturation) is 87.76% for one microphone, and for speaker identification can reduce the minimum detection cost function with saturated test sentences from 6.42% to 4.15%, while the results with clean speech (without saturation) is 5.74% for one microphone and 7.02% for the other one.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Marcos Faúndez-Zanuy,et al.  Speaker identification in mismatch training and testing conditions , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Armando Freitas da Rocha,et al.  Neural Nets , 1992, Lecture Notes in Computer Science.

[4]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[5]  Javier Ortega-Garcia,et al.  AHUMADA: A large speech corpus in Spanish for speaker characterization and identification , 2000, Speech Commun..

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Alessandro Neri,et al.  Methods for estimating the autocorrelation function of complex Gaussian stationary processes , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[9]  Christian Jutten,et al.  Parametric approach to blind deconvolution of nonlinear channels , 2002, ESANN.

[10]  M. Faundez-Zanuy,et al.  On the vulnerability of biometric security systems , 2004, IEEE Aerospace and Electronic Systems Magazine.

[11]  M. Faundez-Zanuy,et al.  Data fusion in biometrics , 2005, IEEE Aerospace and Electronic Systems Magazine.

[12]  M. Faundez-Zanuy Biometric recognition: why not massively adopted yet? , 2005, IEEE Aerospace and Electronic Systems Magazine.

[13]  Kuldip K. Paliwal,et al.  Information Fusion and Person Verification Using Speech & Face Information , 2002 .

[14]  M. J. Korenberg,et al.  The identification of nonlinear biological systems: Wiener and Hammerstein cascade models , 1986, Biological Cybernetics.

[15]  Marcos Faúndez-Zanuy,et al.  On the relevance of language in speaker recognition , 1999, EUROSPEECH.

[16]  Stephen A. Billings,et al.  Identi cation of a class of nonlinear systems using correlation analysis , 1978 .

[17]  Dimitrios Hatzinakos,et al.  Blind identification of LTI-ZMNL-LTI nonlinear channel models , 1995, IEEE Trans. Signal Process..

[18]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[20]  Frédéric Bimbot,et al.  Text-free speaker recognition using an arithmetic-harmonic sphericity measure , 1993, EUROSPEECH.

[21]  C. L. Nikias,et al.  Higher-order spectra analysis : a nonlinear signal processing framework , 1993 .

[22]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[23]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[24]  E. D. Boer Cross-correlation function of a bandpass nonlinear network , 1976 .

[25]  Christian Jutten,et al.  Source separation in post-nonlinear mixtures , 1999, IEEE Trans. Signal Process..

[26]  Christian Jutten,et al.  Quasi-nonparametric blind inversion of Wiener systems , 2001, IEEE Trans. Signal Process..

[27]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[28]  M.R. Raghuveer,et al.  Bispectrum estimation: A digital signal processing framework , 1987, Proceedings of the IEEE.

[29]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[30]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..