Hierarchical Latent Words Language Models for Automatic Speech Recognition

This paper presents hierarchical latent words language models (h-LWLMs) for improving automatic speech recognition (ASR) performance in out-of-domain tasks. Language models called h-LWLM are an advanced form of LWLM that are one one hopeful approach to domain robust language modeling. The key strength of the LWLMs is having a latent word space that helps to efficiently capture linguistic phenomena not present in a training data set. However, standard LWLMs cannot consider that the function and meaning of words are essentially hierarchical. Therefore, h-LWLMs employ a multiple latent word space with hierarchical structure by estimating a latent word of a latent word recursively. The hierarchical latent word space helps us to flexibly calculate generative probability for unseen words. This paper provides a definition of h-LWLM as well as a training method. In addition, we present two implementation methods that enable us to introduce the h-LWLMs into ASR tasks. Our experiments on a perplexity evaluation and an ASR evaluation show the effectiveness of h-LWLMs in out-of-domain tasks.

[1]  Ahmad Emami,et al.  Random clusterings for language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Akinori Ito,et al.  Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition , 2018, IEICE Trans. Inf. Syst..

[3]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[4]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[5]  Takanobu Oba,et al.  Mixture of latent words language models for domain adaptation , 2014, INTERSPEECH.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Hermann Ney,et al.  From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[9]  Hermann Ney,et al.  Language Modeling with Deep Transformers , 2019, INTERSPEECH.

[10]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[11]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[12]  Ito Akinori,et al.  Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition , 2015 .

[13]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[14]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[16]  C. Robert,et al.  Bayesian estimation of hidden Markov chains: a stochastic implementation , 1993 .

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[18]  Akinori Ito,et al.  Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks , 2015, EMNLP.

[19]  Akinori Ito,et al.  Latent words recurrent neural network language models , 2015, INTERSPEECH.

[20]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[21]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[22]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[23]  Satoshi Takahashi,et al.  N-gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition , 2016, IEICE Trans. Inf. Syst..

[24]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[25]  Takanobu Oba,et al.  Viterbi Approximation of Latent Words Language Models for Automatic Speech Recognition , 2019, J. Inf. Process..

[26]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[27]  Akinori Ito,et al.  Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation , 2016, IEICE Trans. Inf. Syst..

[28]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[29]  Kevin P. Murphy,et al.  Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[30]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[31]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[32]  Marie-Francine Moens,et al.  The latent words language model , 2012, Comput. Speech Lang..

[33]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[34]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[35]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[36]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .