Maximum likelihood linear programming data fusion for speaker recognition

Biometric system performance can be improved by means of data fusion. Several kinds of information can be fused in order to obtain a more accurate classification (identification or verification) of an input sample. In this paper we present a method for computing the weights in a weighted sum fusion for score combinations, by means of a likelihood model. The maximum likelihood estimation is set as a linear programming problem. The scores are derived from a GMM classifier working on different feature extraction techniques. Our experimental results assessed the robustness of the system in front changes on time (different sessions) and robustness in front of changes of microphone. The improvements obtained were significantly better (error bars of two standard deviations) than a uniform weighted sum or a uniform weighted product or the best single classifier. The proposed method scales computationally with the number of scores to be fusioned as the simplex method for linear programming.

[1]  Christian Jutten,et al.  Parametric approach to blind deconvolution of nonlinear channels , 2002, ESANN.

[2]  Luc Gagnon,et al.  Nonlinear processing of phase vocoded speech , 1990 .

[3]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[4]  Marcos Faúndez-Zanuy,et al.  Speaker recognition using residual signal of linear and nonlinear prediction models , 1998, ICSLP.

[5]  L. H. Anauer,et al.  Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[6]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[7]  Kishore Prahallad,et al.  Source and system features for speaker recognition using AANN models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[9]  Christian Jutten,et al.  Source separation in post-nonlinear mixtures , 1999, IEEE Trans. Signal Process..

[10]  Marcos Faúndez-Zanuy,et al.  Biometric security technology , 2006, IEEE Aerospace and Electronic Systems Magazine.

[11]  Sadaoki Furui,et al.  Speaker recognition , 1997, Scholarpedia.

[12]  Javier Ortega-Garcia,et al.  AHUMADA: A large speech corpus in Spanish for speaker characterization and identification , 2000, Speech Commun..

[13]  Dinh-Tuan Pham,et al.  Fast approximation of nonlinearities for improving inversion algorithms of PNL mixtures and Wiener systems , 2005, Signal Process..

[14]  Günther Palm,et al.  On the use of residual cepstrum in speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  M. Faundez-Zanuy,et al.  On the vulnerability of biometric security systems , 2004, IEEE Aerospace and Electronic Systems Magazine.

[16]  M. Faundez-Zanuy,et al.  Data fusion in biometrics , 2005, IEEE Aerospace and Electronic Systems Magazine.

[17]  M. Faundez-Zanuy Biometric recognition: why not massively adopted yet? , 2005, IEEE Aerospace and Electronic Systems Magazine.

[18]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.

[19]  A. Hussain,et al.  Nonlinear speech processing: Overview and applications , 2002 .

[20]  E. A. Flinn Comments on “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave” [B. S. Atal and S. L. Hanauer, J. Acoust. Soc. Amer. 50, 637–655 (1971)] , 1972 .

[21]  M. Faundez-Zanuy,et al.  State-of-the-art in speaker recognition , 2005, IEEE Aerospace and Electronic Systems Magazine.

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Nengheng Zheng,et al.  Integration of Complementary Acoustic Features for Speaker Recognition , 2007, IEEE Signal Processing Letters.

[24]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[25]  M.R. Schroeder Speech analysis and synthesis, vol. 4 , 1977, Proceedings of the IEEE.

[26]  E. D. Boer Cross-correlation function of a bandpass nonlinear network , 1976 .

[27]  Christian Jutten,et al.  Quasi-nonparametric blind inversion of Wiener systems , 2001, IEEE Trans. Signal Process..

[28]  Marcos Faúndez-Zanuy,et al.  Application of the mutual information minimization to speaker recognition/identification improvement , 2006, Neurocomputing.

[29]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[30]  M.R. Raghuveer,et al.  Bispectrum estimation: A digital signal processing framework , 1987, Proceedings of the IEEE.

[31]  Heinz Hügli,et al.  Usefulness of the LPC-residue in text-independent speaker verification , 1995, Speech Commun..

[32]  C. L. Nikias,et al.  Higher-order spectra analysis : a nonlinear signal processing framework , 1993 .

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  Alessandro Neri,et al.  Methods for estimating the autocorrelation function of complex Gaussian stationary processes , 1987, IEEE Trans. Acoust. Speech Signal Process..

[35]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[36]  Kazuya Takeda,et al.  Speaker Identification Using Harmonic Structure of LP-residual Spectrum , 1997, AVBPA.

[37]  Marcos Faúndez-Zanuy,et al.  On the relevance of language in speaker recognition , 1999, EUROSPEECH.

[38]  Dimitrios Hatzinakos,et al.  Blind identification of LTI-ZMNL-LTI nonlinear channel models , 1995, IEEE Trans. Signal Process..

[39]  M. Lovera,et al.  Identification of a class of nonlinear parametrically varying models , 2001, 2001 European Control Conference (ECC).

[40]  D. O'Shaughnessy,et al.  Speaker recognition , 1986, IEEE ASSP Magazine.

[41]  S. R. Mahadeva Prasanna,et al.  Features for speaker and language identification , 2004, Odyssey.

[42]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[43]  Stephen A. Billings,et al.  Identi cation of a class of nonlinear systems using correlation analysis , 1978 .