Interactive Topic Modeling with Anchor Words

The formalism of anchor words has enabled the development of fast topic modeling algorithms with provable guarantees. In this paper, we introduce a protocol that allows users to interact with anchor words to build customized and interpretable topic models. Experimental evidence validating the usefulness of our approach is also presented.

[1]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[2]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[3]  Joel A. Tropp,et al.  Factoring nonnegative matrices with linear programs , 2012, NIPS.

[4]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[5]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jordan L. Boyd-Graber,et al.  Tandem Anchoring: a Multiword Anchor Approach for Interactive Topic Modeling , 2017, ACL.

[7]  Alexander J. Smola,et al.  Word Features for Latent Dirichlet Allocation , 2010, NIPS.

[8]  Maria-Florina Balcan,et al.  Local algorithms for interactive clustering , 2013, ICML.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .

[11]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[12]  Maria-Florina Balcan,et al.  Clustering with Interactive Feedback , 2008, ALT.

[13]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[14]  David Mimno,et al.  Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference , 2014, EMNLP.

[15]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[16]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.