Adapting Topic Models using Lexical Associations with Tree Priors

Models work best when they are optimized taking into account the evaluation criteria that people care about. For topic models, people often care about interpretability, which can be approximated using measures of lexical association. We integrate lexical association into topic optimization using tree priors, which provide a flexible framework that can take advantage of both first order word associations and the higher-order associations captured by word embeddings. Tree priors improve topic interpretability without hurting extrinsic performance.

[1]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[2]  Xiaojin Zhu,et al.  A Topic Model for Word Sense Disambiguation , 2007, EMNLP.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Jordan L. Boyd-Graber,et al.  Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent , 2013, NIPS.

[5]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[6]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[9]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[10]  Vladimir Eidelman,et al.  Polylingual Tree-Based Topic Models for Translation Domain Adaptation , 2014, ACL.

[11]  Ying Huang,et al.  Efficient Correlated Topic Modeling with Topic Embedding , 2017, KDD.

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  Alena Lukasová,et al.  Hierarchical agglomerative clustering procedure , 1979, Pattern Recognit..

[14]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[15]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[16]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Yee Whye Teh,et al.  An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering , 2008, NIPS.

[19]  Viet-An Nguyen,et al.  Lexical and Hierarchical Topic Regression , 2013, NIPS.

[20]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[21]  Jun'ichi Tsujii,et al.  A Latent Concept Topic Model for Robust Topic Inference Using Word Embeddings , 2016, ACL.

[22]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[23]  Edwin V. Bonilla,et al.  Improving Topic Coherence with Regularized Topic Models , 2011, NIPS.