Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition

Recently, exemplar-based sparse representation phone identification features (Spif ) have shown promising results on large vocabulary speech recognition tasks. However, one problem with exemplar-based techniques is that they are computationally expensive. In this paper, we present two methods to speed up the creation of Spif features. First, we explore a technique to quickly select a subset of informative exemplars among millions of training examples. Secondly, we make approximations to the sparse representation computation such that a matrix-matrix multiplication is reduced to a matrix-vector product. We present results on four large vocabulary tasks, including Broadcast News where acoustic models are trained with 50 and 400 hours, and a Voice Search task, where models are trained with 160 and 1000 hours. Results on all tasks indicate improvements in speedup by a factor of four relative to the original Spif features, as well as improvements in word error rate (WER) in combination with a baseline HMM system.

[1]  G. Strang Introduction to Linear Algebra , 1993 .

[2]  Tara N. Sainath,et al.  Exemplar-based Sparse Representation phone identification features , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Tuomas Virtanen,et al.  Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Tara N. Sainath,et al.  Sparse representation features for speech recognition , 2010, INTERSPEECH.

[5]  Johan Schalkwyk,et al.  Query language modeling for voice search , 2010, 2010 IEEE Spoken Language Technology Workshop.

[6]  Seymour E. Goodman,et al.  Introduction to the Design and Analysis of Algorithms , 1977 .

[7]  Tara N. Sainath,et al.  Bayesian compressive sensing for phonetic classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Bert Cranen,et al.  Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[9]  D. Kanevsky,et al.  ABCS : Approximate Bayesian Compressed Sensing , 2009 .