On the Determination of Optimal Model Order for GMM-Based Text-Independent Speaker Identification

Gaussian mixture models (GMMs) are recently employed to provide a robust technique for speaker identification. The determination of the appropriate number of Gaussian components in amodel for adequate speaker representation is a crucial but difficult problem. This number is in fact speaker dependent. Therefore, assuming a fixed number of Gaussian components for all speakers is not justified. In this paper, we develop a procedure for roughly estimating the maximum possible model order above which the estimation of model parameters becomes unreliable. In addition, a theoretical measure, namely, a goodness of fit (GOF) measure is derived and utilized in estimating the number of Gaussian components needed to characterize different speakers. The estimation is carried out by exploiting the distribution of the training data for each speaker. Experimental results indicate that the proposed technique provides results comparable to other well-known model selection criteria like the minimum description length (MDL) and the Akaike information criterion (AIC).

[1]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[2]  R. H. Myers,et al.  Probability and Statistics for Engineers and Scientists , 1978 .

[3]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  D. Reynolds Automatic Speaker Recognition Using Gaussian Mixture Speaker Models , 1995 .

[6]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[7]  Pierre Dumouchel,et al.  GMM based speaker identification using training-time-dependent number of mixtures , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[9]  R.L. Mitchell Importance Sampling Applied to Simulation of False Alarm Statistics , 1981, IEEE Transactions on Aerospace and Electronic Systems.

[10]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[11]  Bernard Picinbono,et al.  Éléments de théorie du signal , 1977 .

[12]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[13]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[14]  H. Akaike A new look at the statistical model identification , 1974 .

[15]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..