Model compression for GMM based speaker recognition systems

For large-scale deployments of speaker verification systems models size can be an important issue for not only minimizing storage requirements but also reducing transfer time of models over networks. Model size is also critical for deployments to small, portable devices. In this paper we present a new model compression technique for Gaussian Mixture Model (GMM) based speaker recognition systems. For GMM systems using adaptation from a background model, the compression technique exploits the fact that speaker models are adapted from a single speaker- independent model and not all parameters need to be stored. We present results on the 2002 NIST speaker recognition evaluation cellular telephone corpus and show that the compression technique provides a good tradeoff of compression ratio to performance loss. We are able to achieve a 56:1 compression (624KB ∆ 11KB) with only a 3.2% relative increase in EER (9.1% ∆ 9.4%).

[1]  Douglas A. Reynolds,et al.  A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..