论文信息 - Sparse representation features for speech recognition

Sparse representation features for speech recognition

In this paper, we explore the use of exemplar-based sparse representations (SRs) to map test features into the linear span of training examples. We show that the frame classification accuracy with these new features is 1.3% higher than a Gaussian Mixture Model (GMM), showing that not only do SRs move test features closer to training, but also move the features closer to the correct class. Given these new SR features, we train up a Hidden Markov Model (HMM) on these features and perform recognition. On the TIMIT corpus, we show that applying the SR features on top of our best discriminatively trained system allows for a 0.7% absolute reduction in phonetic error rate (PER), from 19.9% to 19.2%. In fact, after applying model adaptation we reduce the PER to 19.0%, the best results on TIMIT to date. Furthermore, on a large vocabulary 50 hour broadcast news task, we achieve a reduction in word error rate (WER) of 0.3% absolute, demonstrating the benefit of this method for large vocabulary speech recognition.

Tara N. Sainath | Bhuvana Ramabhadran | Dimitri Kanevsky | David Nahamoo | Abhinav Sethy

[1] Tara N. Sainath,et al. An exploration of large vocabulary tools for small vocabulary phonetic recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[2] Georg Heigold,et al. Speech recognition with state-based nearest neighbour classifiers , 2007, INTERSPEECH.

[3] Brian Kingsbury,et al. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Tara N. Sainath,et al. Bayesian compressive sensing for phonetic classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5] Tuomas Virtanen,et al. Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Allen Y. Yang,et al. Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.