论文信息 - Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition

Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition

Recently, exemplar-based sparse representation phone identification features (Spif ) have shown promising results on large vocabulary speech recognition tasks. However, one problem with exemplar-based techniques is that they are computationally expensive. In this paper, we present two methods to speed up the creation of Spif features. First, we explore a technique to quickly select a subset of informative exemplars among millions of training examples. Secondly, we make approximations to the sparse representation computation such that a matrix-matrix multiplication is reduced to a matrix-vector product. We present results on four large vocabulary tasks, including Broadcast News where acoustic models are trained with 50 and 400 hours, and a Voice Search task, where models are trained with 160 and 1000 hours. Results on all tasks indicate improvements in speedup by a factor of four relative to the original Spif features, as well as improvements in word error rate (WER) in combination with a baseline HMM system.

Tara N. Sainath | Bhuvana Ramabhadran | Dimitri Kanevsky | David Nahamoo

[1] G. Strang. Introduction to Linear Algebra , 1993 .

[2] Tara N. Sainath,et al. Exemplar-based Sparse Representation phone identification features , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Tuomas Virtanen,et al. Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Tara N. Sainath,et al. Sparse representation features for speech recognition , 2010, INTERSPEECH.

[5] Johan Schalkwyk,et al. Query language modeling for voice search , 2010, 2010 IEEE Spoken Language Technology Workshop.

[6] Seymour E. Goodman,et al. Introduction to the Design and Analysis of Algorithms , 1977 .

[7] Tara N. Sainath,et al. Bayesian compressive sensing for phonetic classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] Bert Cranen,et al. Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[9] D. Kanevsky,et al. ABCS : Approximate Bayesian Compressed Sensing , 2009 .