Sentence level topic models for associated topics extraction

In LDA model, independence assumptions in the Dirichlet distribution of the topic proportions lead to the inability to model the connections between topics. Some researchers have attempted to break them and thus obtained more powerful topic models. Following this strategy, by using an association matrix to measure the association between latent topics, we develop an associated topic model (ATM), in which consecutive sentences are considered important and the topic assignments for words are jointly determined by the association matrix and the sentence level topic distributions, instead of the document-specific topic distributions only. This approach gives a more realistic modeling of latent topic connections where the presence of a topic may be connected with the presence of another. We derive a collapsed Gibbs sampling algorithm for inference and parameter estimation for the ATM. The experimental results demonstrate that the ATM gives a more practical interpretation and is capable of learning more associated topics.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[3]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[5]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[6]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[7]  Sahin Albayrak,et al.  Identifying Sentence-Level Semantic Content Units with Topic Models , 2010, 2010 Workshops on Database and Expert Systems Applications.

[8]  Aleks Jakulin,et al.  Applying Discrete PCA in Data Analysis , 2004, UAI.

[9]  Di He,et al.  Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves , 2016, ArXiv.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Edwin V. Bonilla,et al.  Improving Topic Coherence with Regularized Topic Models , 2011, NIPS.

[12]  Andrew McCallum,et al.  Correlations and Anticorrelations in LDA Inference , 2011 .

[13]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[14]  Derek Greene,et al.  An analysis of the coherence of descriptors in topic modeling , 2015, Expert Syst. Appl..

[15]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[16]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[17]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[18]  Georgios Balikas,et al.  On a Topic Model for Sentences , 2016, SIGIR.

[19]  Bo Thiesson,et al.  Markov Topic Models , 2009, AISTATS.

[20]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[21]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[22]  Seungjin Choi,et al.  Two-dimensional correlated topic models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Diyi Yang,et al.  Incorporating Word Correlation Knowledge into Topic Modeling , 2015, NAACL.

[24]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[25]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[26]  Pierre Legendre,et al.  Association Measures and Matrices , 2011 .

[27]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[28]  Gabriella Vigliocco,et al.  The Hidden Markov Topic Model: A Probabilistic Model of Semantic Representation , 2010, Top. Cogn. Sci..

[29]  Andrew McCallum,et al.  A Note on Topical N-grams , 2005 .

[30]  Chang Wang,et al.  Relation Extraction with Relation Topics , 2011, EMNLP.

[31]  Alexander J. Smola,et al.  Word Features for Latent Dirichlet Allocation , 2010, NIPS.

[32]  Hal Daumé,et al.  Incorporating Lexical Priors into Topic Models , 2012, EACL.

[33]  Haiping Xu,et al.  SLTM: A Sentence Level Topic Model for Analysis of Online Reviews , 2016, SEKE.

[34]  Arjun Mukherjee,et al.  Leveraging Multi-Domain Prior Knowledge in Topic Models , 2013, IJCAI.

[35]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .