Error-Driven Pruning of Language Models for Virtual Assistants

Language models (LMs) for virtual assistants (VAs) are typically trained on large amounts of data, resulting in prohibitively large models which require excessive memory and/or cannot be used to serve user requests in real-time. Entropy pruning results in smaller models but with significant degradation of effectiveness in the tail of the user request distribution. We customize entropy pruning by allowing for a keep list of infrequent n-grams that require a more relaxed pruning threshold, and propose three methods to construct the keep list. Each method has its own advantages and disadvantages with respect to LM size, ASR accuracy and cost of constructing the keep list. Our best LM gives 8% average Word Error Rate (WER) reduction on a targeted test set, but is 3 times larger than the baseline. We also propose discriminative methods to reduce the size of the LM while retaining the majority of the WER gains achieved by the largest LM.

[1]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2]  Shiliang Zhang,et al.  The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models , 2015, ACL.

[3]  Manos Tsagkias,et al.  Predicting Entity Popularity to Improve Spoken Entity Recognition by Virtual Assistants , 2020, SIGIR.

[4]  Tara N. Sainath,et al.  A Spelling Correction Model for End-to-end Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Yuuki Tachioka,et al.  Discriminative method for recurrent neural network language models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  Mirko Hannemann,et al.  Connecting and Comparing Language Model Interpolation Techniques , 2019, INTERSPEECH.

[9]  Akinori Ito,et al.  Round-Robin Duel Discriminative Language Models , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Thorsten Brants,et al.  Study on interaction between entropy pruning and kneser-ney smoothing , 2010, INTERSPEECH.

[11]  Karen Livescu,et al.  Jointly learning to align and convert graphemes to phonemes with neural attention models , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[12]  Ariya Rastrow,et al.  Scalable Language Model Adaptation for Spoken Dialogue Systems , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[13]  Brian Roark,et al.  Corrective language modeling for large vocabulary ASR with the perceptron algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[15]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[16]  Min Zhang,et al.  Improving Language Model Size Reduction using Better Pruning Criteria , 2002, ACL.

[17]  Xiaodan Zhuang,et al.  SNDCNN: Self-Normalizing Deep CNNs with Scaled Exponential Linear Units for Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Ronald Rosenfeld,et al.  Scalable backoff language models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Zhizheng Wu,et al.  Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System , 2017, INTERSPEECH.