论文信息 - Tandem Anchoring: a Multiword Anchor Approach for Interactive Topic Modeling - 字舞流文

Tandem Anchoring: a Multiword Anchor Approach for Interactive Topic Modeling

Interactive topic models are powerful tools for those seeking to understand large collections of text. However, existing sampling-based interactive topic modeling approaches scale poorly to large data sets. Anchor methods, which use a single word to uniquely identify a topic, offer the speed needed for interactive work but lack both a mechanism to inject prior knowledge and lack the intuitive semantics needed for user-facing applications. We propose combinations of words as anchors, going beyond existing single word anchor algorithms—an approach we call “Tandem Anchors”. We begin with a synthetic investigation of this approach then apply the approach to interactive topic modeling in a user study and compare it to interactive and non-interactive approaches. Tandem anchors are faster and more intuitive than existing interactive approaches.

Jordan L. Boyd-Graber | Kevin D. Seppi | Jeffrey Lund | Connor Cook | K. Seppi | Jeffrey Lund | C. Cook

[1] Thang Nguyen,et al. Is Your Anchor Going Up or Down? Fast and Accurate Supervised Topic Models , 2015, NAACL.

[2] Jeffrey Heer,et al. Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[3] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[4] E. S. Pearson,et al. Tests for departure from normality. Empirical results for the distributions of b2 and √b1 , 1973 .

[5] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6] Quentin Pleple,et al. Interactive Topic Modeling , 2013 .

[7] Sanjeev Arora,et al. Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[8] Jaegul Choo,et al. UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[9] David Mimno,et al. Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference , 2014, EMNLP.

[10] Marina Meila,et al. Comparing Clusterings by the Variation of Information , 2003, COLT.

[11] Timothy N. Rubin,et al. Statistical topic models for multi-label document classification , 2011, Machine Learning.

[12] Jordan L. Boyd-Graber,et al. Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms , 2014, ACL.

[13] Xiaojin Zhu,et al. Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[14] Kristin A. Cook,et al. Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[15] Thomas L. Griffiths,et al. The Author-Topic Model for Authors and Documents , 2004, UAI.

[16] Sanjeev Arora,et al. A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[17] Matt Gardner. The Topic Browser An Interactive Tool for Browsing Topic Models , 2010 .

[18] Francis R. Bach,et al. Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[19] Ivan Titov,et al. A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[20] W. Bruce Croft,et al. LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[21] H. S. Heaps,et al. Information retrieval, computational and theoretical aspects , 1978 .

[22] Daniel Barbará,et al. Topic Significance Ranking of LDA Generative Models , 2009, ECML/PKDD.

[23] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[24] Ka Yee Yeung,et al. Details of the Adjusted Rand index and Clustering algorithms Supplement to the paper “ An empirical study on Principal Component Analysis for clustering gene expression data ” ( to appear in Bioinformatics ) , 2001 .

[25] Sanjeev Arora,et al. Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[26] Timothy Baldwin,et al. Automatic Evaluation of Topic Coherence , 2010, NAACL.

[27] Jordan L. Boyd-Graber,et al. Efficient Tree-Based Topic Modeling , 2012, ACL.