Fuzzy Gaussian mixture models for speaker recognition

The Gaussian mixture model (GMM) is an important application of statistical clustering to speaker recognition. A number of prototypes are generated from the training feature vectors by representing the feature space as a mixture of Gaussian distributions. Each prototype consists of a model parameter set including mean vector, covariance matrix and mixture weight. In fuzzy clustering, the fuzzy c-means (FCM) method is the most widely used. Model parameters in each prototype include fuzzy mean vector and fuzzy covariance matrix. Both the GMM and the FCM methods have similar characteristics: using iterative optimisation algorithms, feature vectors can belong to more than one class, and degrees of belonging of a vector across classes sum to one. From these similarities, a FCM-based generalisation to the GMM called the fuzzy GMM (FGMM) is proposed in this paper. Fuzzy mixture weights are introduced by redefining the distances in the FCM functionals. The FGMM algorithm and its use in speaker recognition are considered The experimental results show that with a suitable degree offuzziness, the FGMMs are more effective than the GMMs in tests on 16 speakers using the T/46 database and on 108 speakers using the ANDOSL database.

[1]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[2]  James C. Bezdek,et al.  A Review of Probabilistic, Fuzzy, and Neural Models for Pattern Recognition , 1996, J. Intell. Fuzzy Syst..

[3]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[4]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[5]  Mark J. F. Gales,et al.  Broadcast news transcription using HTK , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Philip A. Chou,et al.  Entropy-constrained vector quantization , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[8]  Biing-Hwang Juang,et al.  The past, present, and future of speech processing , 1998, IEEE Signal Process. Mag..

[9]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[10]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[11]  Sadaoki Furui,et al.  An Overview of Speaker Recognition Technology , 1996 .

[12]  J.B. Millar,et al.  The Australian National Database of Spoken Language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[14]  Hichem Frigui,et al.  The fuzzy c spherical shells algorithm: A new approach , 1992, IEEE Trans. Neural Networks.

[15]  D. Tran,et al.  Fuzzy expectation-maximisation algorithm for speech and speaker recognition , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[16]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[17]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[18]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition , 1992 .

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Rajesh N. Davé,et al.  Adaptive fuzzy c-shells clustering and detection of ellipses , 1992, IEEE Trans. Neural Networks.

[25]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[26]  Dat Tran,et al.  Fuzzy hidden Markov models for speech and speaker recognition , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).