Speaker recognition based on SOINN and incremental learning Gaussian mixture model

Gaussian Mixture Models has been widely used in speaker recognition during the last decades. To deal with the dynamic growth of datasets, initial clustering problem and achieving the results of clustering effectively on incremental data, an incremental adaptation method called incremental learning Gaussian mixture model (IGMM) is proposed in this paper. It was applied to speaker recognition system based on Self Organization Incremental Learning Neural Network (SOINN) and improved EM algorithm. SOINN is a Neural Network which can reach a suitable mixture number and appropriate initial cluster for each model. First, the initial training is conducted by SOINN and EM algorithm only need a limited amount of data. Then, the model would adapt to the data available in each session to enrich itself incrementally and recursively. Experiments were taken on the 1st speech separation challenge database. The results show that IGMM outperforms GMM and classical Bayesian adaptation in most of the cases.

[1]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[3]  Shen Furao,et al.  A fast nearest neighbor classifier based on self-organizing incremental neural network , 2008, Neural Networks.

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  Reda Jourani,et al.  Large margin Gaussian mixture models for speaker identification , 2010, INTERSPEECH.

[6]  Shen Furao,et al.  A Self-organized Incremental Network for Online Supervised Learning and Topology Learning , 2006 .

[7]  Leah H. Jamieson,et al.  Endpoint detection of isolated utterances based on a modified Teager energy measurement , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[10]  John G. Taylor,et al.  Speaker identification for security systems using reinforcement-trained pRAM neural network architectures , 2001 .

[11]  Shen Furao,et al.  Self-Organizing Incremental Neural Network and Its Application , 2010, ICANN.

[12]  Shen Furao,et al.  An enhanced self-organizing incremental neural network for online unsupervised learning , 2007, Neural Networks.

[13]  Geoffrey E. Hinton,et al.  Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates , 2000, J. VLSI Signal Process..

[14]  Toby Berger,et al.  Efficient text-independent speaker verification with structural Gaussian mixture models and neural network , 2003, IEEE Trans. Speech Audio Process..

[15]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[16]  Biing-Hwang Juang,et al.  An investigation on the use of acoustic sub-word units for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..