论文信息 - Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment - 字舞流文

Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment

The use of topic models to analyze domain-specific texts often requires manual validation of the latent topics to ensure that they are meaningful. We introduce a framework to support such a large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, and repeated topics. Our analysis compares 10,000 topic model variants to 200 expert-provided domain concepts, and demonstrates how our framework can inform choices of model parameters, inference algorithms, and intrinsic measures of topical quality.

Jeffrey Heer | Christopher D. Manning | Jason Chuang | Sonal Gupta | Jason Chuang | Jeffrey Heer | S. Gupta

[1] Jeffrey Heer,et al. Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[2] David Buttler,et al. Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[3] Thomas L. Griffiths,et al. Probabilistic author-topic models for information discovery , 2004, KDD.

[4] Ruslan Salakhutdinov,et al. Evaluation methods for topic models , 2009, ICML '09.

[5] Andrew McCallum,et al. Rethinking LDA: Why Priors Matter , 2009, NIPS.

[6] Timothy Baldwin,et al. Automatic Evaluation of Topic Coherence , 2010, NAACL.

[7] W. Bruce Croft,et al. LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[8] Daniel Barbará,et al. Topic Significance Ranking of LDA Generative Models , 2009, ECML/PKDD.

[9] Thomas L. Griffiths,et al. Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[10] Padhraic Smyth,et al. Analyzing Entities and Topics in News Articles Using Statistical Topic Models , 2006, ISI.

[11] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12] Peter Pirolli,et al. Modeling Information Scent: A Comparison of LSA, PMI and GLSA Similarity Measures on Common Tests and Corpora , 2007, RIAO.

[13] Andrew McCallum,et al. Database of NIH grants using machine-learned categories and graphical clustering , 2011, Nature Methods.

[14] Timothy Baldwin,et al. Evaluating topic models for digital libraries , 2010, JCDL '10.

[15] John D. Lafferty,et al. Dynamic topic models , 2006, ICML.

[16] David M. Blei,et al. Visualizing Topic Models , 2012, ICWSM.

[17] Andrew McCallum,et al. Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[18] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19] Quentin Pleple,et al. Interactive Topic Modeling , 2013 .

[20] Ivan Titov,et al. A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[21] Wayne D. Gray,et al. Basic objects in natural categories , 1976, Cognitive Psychology.

[22] Stefan Trausan-Matu,et al. Improving Topic Evaluation Using Conceptual Knowledge , 2011, IJCAI.

[23] Daniel Jurafsky,et al. Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[24] Chong Wang,et al. Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[25] Jeffrey Heer,et al. Interpretation and trust: designing model-driven visualizations for text analysis , 2012, CHI.

[26] Susan T. Dumais,et al. Characterizing Microblogs with Topic Models , 2010, ICWSM.

[27] Padhraic Smyth,et al. TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling , 2012, TIST.

[28] Kathy E. Johnson,et al. Effects of varying levels of expertise on the basic level of categorization. , 1997, Journal of experimental psychology. General.

[29] Andrew McCallum,et al. Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[30] Thomas L. Griffiths,et al. The Author-Topic Model for Authors and Documents , 2004, UAI.

[31] Ramesh Nallapati,et al. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[32] Susan T. Dumais,et al. Partially labeled topic models for interpretable text mining , 2011, KDD.