Linear and non-linear fusion of ALISP-based and GMM systems for text-independent speaker verification

Current state-of-the-art speaker verification algorithms use Gaussian Mixture Models (GMM) to estimate the probability density function of the acoustic feature vectors. They are denoted here as global systems. In order to give better performance, they have to be combined with other classifiers, using different fusion methods. The performance of the final classifier depend on the choice of the single classifiers and also on the fusion technique used to combine them. In our previous studies we have used the data-driven Automatic Language Independent Speech Processing (ALISP) segmentation method to segment the speech data, as a first step of the speaker verification task. Dynamic Time Warping (DTW) distortion measure was used as a distortion measure between two speech segments and Logistic Regression Function to determine the optimal weights of the speech segments (including “silences”). This system is denoted as ALISP-DTW system. In this paper the focus is put on the fusion techniques used to combine ALISP-DTW and GMM systems. We show that when using a non-linear fusion method (Multi-Layer Perceptron), we improve slightly the final fusion result as compared to the linear fusion strategies.

[1]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[2]  Gérard Chollet,et al.  Comparing decision fusion paradigms using -NN based classifiers, decision trees and logistic regression in a multi-modal identity verification ap plication , 1999 .

[3]  Josef Kittler,et al.  Feature selection for a DTW-based speaker verification system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Asmaa El Hannani,et al.  Segmental Scores Fusion for ALISP-Based GMM Text-Independent Speaker Verification , 2004, Summer School on Neural Networks.

[5]  Jean Hennebert,et al.  Text-prompted speaker verification experiments with phoneme specific MLPs , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Luís A. Alexandre,et al.  On combining classifiers using sum and product rules , 2001, Pattern Recognit. Lett..

[7]  Simon Haykin,et al.  Neural networks , 1994 .

[8]  Gérard Chollet,et al.  Toward ALISP: A proposal for Automatic Language Independent Speech Processing , 1999 .

[9]  Kamal A. Ali,et al.  On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles , 1995 .

[10]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[11]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[12]  Gérard Chollet,et al.  Segmental Approaches for Automatic Speaker Verification , 2000, Digit. Signal Process..

[13]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[14]  A. Woods,et al.  Statistics in Language Studies , 1986 .

[15]  Gérard Chollet,et al.  Searching through a Speech Memory for Text-Independent Speaker Verification , 2003, AVBPA.

[16]  Pascal Druyts,et al.  Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions , 2000, Digit. Signal Process..

[17]  Jesper Ø. Olsen A two-stage procedure for phone based speaker verification , 1997, Pattern Recognit. Lett..

[18]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[19]  K. M. Ponting,et al.  Computational Models of Speech Pattern Processing , 1999, NATO ASI Series.

[20]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[21]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[22]  J.P. Eatock,et al.  A quantitative assessment of the relative speaker discriminating properties of phonemes , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.