Paragraph vector based topic model for language model adaptation

Topic model is an important approach for language model (LM) adaptation and has attracted research interest for a long time. Latent Dirichlet Allocation (LDA), which assumes generative Dirichlet distribution with bag-of-word features for hidden topics, has been widely used as the state-of-the-art topic model. Inspired by recent development of a new paradigm of distributed paragraph representation called paragraph vector, a new topic model based on paragraph vector is proposed in this work. During training, each paragraph is mapped to a unique vector in continuous space. Then unsupervised clustering is performed to construct topic clusters. Topic-specific LM is then built based on clustering results. During adaptation, topic posterior is first estimated using the paragraph vector based topic model and new adapted LMs are constructed by interpolating the existing topic-specific models using topic posteriors. The proposed topic model is applied for N-gram LM adaptation and evaluated on Amazon Product Review Corpus for perplexity and a Chinese LVCSR task for CER evaluation. Results show that the proposed approach yields 11.1% relative perplexity reduction and 1.4% relative CER reduction over N-gram baseline, outperforming LDA based method proposed by previous work.

[1]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[5]  Hung-An Chang,et al.  Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm , 2007, INTERSPEECH.

[6]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[7]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[12]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[13]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[14]  Feifan Liu,et al.  Unsupervised language model adaptation via topic modeling based on named entity hypotheses , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[16]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[17]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[18]  Tanja Schultz,et al.  Dynamic language model adaptation using variational Bayes inference , 2005, INTERSPEECH.

[19]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[20]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.