论文信息 - Hierarchical Latent Words Language Models for Automatic Speech Recognition

Hierarchical Latent Words Language Models for Automatic Speech Recognition

This paper presents hierarchical latent words language models (h-LWLMs) for improving automatic speech recognition (ASR) performance in out-of-domain tasks. Language models called h-LWLM are an advanced form of LWLM that are one one hopeful approach to domain robust language modeling. The key strength of the LWLMs is having a latent word space that helps to efficiently capture linguistic phenomena not present in a training data set. However, standard LWLMs cannot consider that the function and meaning of words are essentially hierarchical. Therefore, h-LWLMs employ a multiple latent word space with hierarchical structure by estimating a latent word of a latent word recursively. The hierarchical latent word space helps us to flexibly calculate generative probability for unseen words. This paper provides a definition of h-LWLM as well as a training method. In addition, we present two implementation methods that enable us to introduce the h-LWLMs into ASR tasks. Our experiments on a perplexity evaluation and an ASR evaluation show the effectiveness of h-LWLMs in out-of-domain tasks.

[1] Ahmad Emami,et al. Random clusterings for language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2] Akinori Ito,et al. Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition , 2018, IEICE Trans. Inf. Syst..

[3] G. Casella,et al. Explaining the Gibbs Sampler , 1992 .

[4] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .

[5] Takanobu Oba,et al. Mixture of latent words language models for domain adaptation , 2014, INTERSPEECH.

[6] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7] Hermann Ney,et al. From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8] Thomas L. Griffiths,et al. Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[9] Hermann Ney,et al. Language Modeling with Deep Transformers , 2019, INTERSPEECH.

[10] Andreas Stolcke,et al. Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[11] Yoram Singer,et al. The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[12] Ito Akinori,et al. Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition , 2015 .

[13] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[14] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[16] C. Robert,et al. Bayesian estimation of hidden Markov chains: a stochastic implementation , 1993 .

[17] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[18] Akinori Ito,et al. Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks , 2015, EMNLP.

[19] Akinori Ito,et al. Latent words recurrent neural network language models , 2015, INTERSPEECH.

[20] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[21] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[22] Hitoshi Isahara,et al. Spontaneous Speech Corpus of Japanese , 2000, LREC.

[23] Satoshi Takahashi,et al. N-gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition , 2016, IEICE Trans. Inf. Syst..

[24] David J. C. MacKay,et al. A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[25] Takanobu Oba,et al. Viterbi Approximation of Latent Words Language Models for Automatic Speech Recognition , 2019, J. Inf. Process..

[26] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[27] Akinori Ito,et al. Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation , 2016, IEICE Trans. Inf. Syst..

[28] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..

[29] Kevin P. Murphy,et al. Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[30] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[31] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[32] Marie-Francine Moens,et al. The latent words language model , 2012, Comput. Speech Lang..

[33] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[34] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[35] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[36] S. L. Scott. Bayesian Methods for Hidden Markov Models , 2002 .