Discriminative training of the feature space using the minimum phone error (MPE) objective function has been shown to yield remarkable accuracy improvements. These gains, however, come at a high cost of memory required to store the transform. In a previous paper we reduced this memory requirement by 94% by quantizing the transform parameters. We used dimension dependent quantization tables and learned the quantization values with a fixed assignment of transform parameters to quantization values. In this paper we refine and extend the techniques to attain a further 35% reduction in memory with no degradation in sentence error rate. We discuss a principled method to assign the transform parameters to quantization values. We also show how the memory can be gradually reduced using a Viterbi algorithm to optimally assign variable number of bits to dimension dependent quantization tables. The techniques described could also be applied to the quantization of general linear transforms - a problem that should be of wider interest.
[1]
Geoffrey Zweig,et al.
fMPE: discriminatively trained features for speech recognition
,
2005,
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[2]
Ramesh A. Gopinath,et al.
Maximum likelihood modeling with Gaussian distributions for classification
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[3]
Daniel Povey,et al.
Improvements to fMPE for discriminative training of features
,
2005,
INTERSPEECH.
[4]
Vaibhava Goel,et al.
Compacting discriminative feature space transforms for embedded devices
,
2009,
INTERSPEECH.