Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition

This paper proposes a novel domain adaptation method that can utilize out-of-domain text resources and partially domain matched text resources in language modeling. A major problem in domain adaptation is that it is hard to obtain adequate adaptation effects from out-ofdomain text resources. To tackle the problem, our idea is to carry out model merger in a latent variable space created from latent words language models (LWLMs). The latent variables in the LWLMs are represented as specific words selected from the observed word space, so LWLMs can share a common latent variable space. It enables us to perform flexible mixture modeling with consideration of the latent variable space. This paper presents two types of mixture modeling, i.e., LWLM mixture models and LWLM cross-mixture models. The LWLM mixture models can perform a latent word space mixture modeling to mitigate domain mismatch problem. Furthermore, in the LWLM cross-mixture models, LMs which individually constructed from partially matched text resources are split into two element models, each of which can be subjected to mixture modeling. For the approaches, this paper also describes methods to optimize mixture weights using a validation data set. Experiments show that the mixture in latent word space can achieve performance improvements for both target domain and out-of-domain compared with that in observed word space. key words: domain adaptation, mixture modeling, latent words language models, latent variable space, automatic speech recognition

[1]  Steve Renals,et al.  Hierarchical Pitman-Yor language models for ASR in meetings , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2]  Yangyang Shi,et al.  K-component recurrent neural network language models using curriculum learning , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[3]  Takanobu Oba,et al.  Mixture of latent words language models for domain adaptation , 2014, INTERSPEECH.

[4]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[5]  Satoshi Takahashi,et al.  N-gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition , 2016, IEICE Trans. Inf. Syst..

[6]  M. Ostendorf,et al.  Using out-of-domain data to improve in-domain language models , 1997, IEEE Signal Processing Letters.

[7]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[8]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[9]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[10]  Bo-June Paul Hsu,et al.  Generalized linear interpolation of language models , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[11]  Thomas Niesler,et al.  Combination of word-based and category-based language models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Yangyang Shi,et al.  Recurrent neural network language model adaptation with curriculum learning , 2015, Comput. Speech Lang..

[13]  Akinori Ito,et al.  Language Model Expansion Using Webdata for Spoken Document Retrieval , 2011, INTERSPEECH.

[14]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[15]  Atsushi Nakamura,et al.  Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Phil Blunsom,et al.  A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction , 2011, ACL.

[18]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[19]  Marie-Francine Moens,et al.  The latent words language model , 2012, Comput. Speech Lang..

[20]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[21]  Mark J. F. Gales,et al.  Context dependent language model adaptation , 2008, INTERSPEECH.

[22]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[23]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[24]  Satoshi Takahashi,et al.  Use of latent words language models in ASR: A sampling-based implementation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[26]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[27]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[28]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[29]  Satoshi Takahashi,et al.  Viterbi decoding for latent words language models using gibbs sampling , 2013, INTERSPEECH.

[30]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[31]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[32]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.