Minimum discrimination information-based language model adaptation using tiny domain corpora for intelligent personal assistants

This paper proposes a novel Language Model (LM) adaptation method based on Minimum Discrimination Information (MDI). In the proposed method, a background LM is viewed as a discrete distribution and an adapted LM is built to be as close as possible to the background LM, while satisfying unigram constraint. This is due to the fact that there is a limited amount of domain corpus available for the adaptation of a natural language-based intelligent personal assistant system. Two unigram constraint estimation methods are proposed: one based on word frequency in the domain corpus, and one based on word similarity estimated from WordNet. In terms of the adapted LM's perplexity using word frequency in tiny domain corpora (ranging from 30~120 seconds in length) the relative performance improvements are measured at 13.9%~16.6%. Further relative performance improvements (1.5%~2.4%) are observed when WordNet is used to generate word similarities. These successes express an efficient ways for re-scaling and normalizing the conditional distribution, which uses an interpolation-based LM.

[1]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Robert L. Mercer,et al.  Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[3]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[4]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[5]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[8]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[9]  Euripides G. M. Petrakis,et al.  Information Retrieval by Semantic Similarity , 2006, Int. J. Semantic Web Inf. Syst..

[10]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  James Curran,et al.  Ensemble Methods for Automatic Thesaurus Extraction , 2002, EMNLP.

[12]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[13]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[14]  Jean-Luc Gauvain,et al.  LANGUAGE MODEL ADAPTATION FOR BROADCAST NEWS TRANSCRIPTION , 2001 .

[15]  Salim Roukos,et al.  MDI adaptation of language models across corpora , 1997, EUROSPEECH.

[16]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[17]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[18]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[19]  Tatsuya Kawahara,et al.  Task adaptation using MAP estimation in N-gram language modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  E. Henry,et al.  [8] Singular value decomposition: Application to analysis of experimental data , 1992 .

[21]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Thomas Fang Zheng,et al.  Language model adaptation based on the classification of a trigram's language style feature , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[23]  Mirella Lapata,et al.  Language Models Based on Semantic Composition , 2009, EMNLP.

[24]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[25]  Ahmet Cüneyd A Probabilistic Mobile Text Entry System for Agglutinative Languages , 2010 .

[26]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[27]  Ronald Rosenfeld,et al.  Using story topics for language model adaptation , 1997, EUROSPEECH.

[28]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[29]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .