Is Your Anchor Going Up or Down? Fast and Accurate Supervised Topic Models

Topic models provide insights into document collections, and their supervised extensions also capture associated document-level metadata such as sentiment. However, inferring such models from data is often slow and cannot scale to big data. We build upon the “anchor” method for learning topic models to capture the relationship between metadata and latent topics by extending the vector-space representation of word-cooccurrence to include metadataspecific dimensions. These additional dimensions reveal new anchor words that reflect specific combinations of metadata and topic. We show that these new latent representations predict sentiment as accurately as supervised topic models, and we find these representations more quickly without sacrificing interpretability.

[1]  Jordan L. Boyd-Graber,et al.  Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce , 2012, WWW.

[2]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[3]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[4]  Alexander J. Smola,et al.  Reducing the sampling complexity of topic models , 2014, KDD.

[5]  Jordan L. Boyd-Graber,et al.  Efficient Tree-Based Topic Modeling , 2012, ACL.

[6]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[7]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[8]  Santosh S. Vempala,et al.  Spectral Algorithms , 2009, Found. Trends Theor. Comput. Sci..

[9]  Jun Zhu,et al.  Spectral Methods for Supervised Topic Models , 2014, NIPS.

[10]  Jordan L. Boyd-Graber,et al.  Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms , 2014, ACL.

[11]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[12]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[15]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[16]  Philip Resnik,et al.  Sometimes Average is Best: The Importance of Averaging for Prediction using MCMC Inference in Topic Modeling , 2014, EMNLP.

[17]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[18]  Shay B. Cohen,et al.  A Provably Correct Learning Algorithm for Latent-Variable PCFGs , 2014, ACL.

[19]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[20]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[21]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[22]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[23]  Bo Zhang,et al.  Scalable inference in max-margin topic models , 2013, KDD.

[24]  Jordan L. Boyd-Graber,et al.  Grammatical structures for word-level sentiment detection , 2012, NAACL.

[25]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[26]  Timothy N. Rubin,et al.  Statistical topic models for multi-label document classification , 2011, Machine Learning.

[27]  Viet-An Nguyen,et al.  Lexical and Hierarchical Topic Regression , 2013, NIPS.

[28]  Philip Resnik,et al.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation , 2010, EMNLP.

[29]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[30]  David Mimno,et al.  Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference , 2014, EMNLP.

[31]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[32]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[33]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35]  Shay B. Cohen,et al.  Online Adaptor Grammars with Hybrid Inference , 2014, Transactions of the Association for Computational Linguistics.

[36]  Philip Resnik,et al.  Political Ideology Detection Using Recursive Neural Networks , 2014, ACL.

[37]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[38]  Michael J. Paul,et al.  A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics , 2010, AAAI.