Modeling online reviews with multi-grain topic models

In this paper we present a novel framework for extracting the ratable aspects of objects from online user reviews. Extracting such aspects is an important challenge in automatically mining product opinions from the web and in generating opinion-based summaries of user reviews [18, 19, 7, 12, 27, 36, 21]. Our models are based on extensions to standard topic modeling methods such as LDA and PLSA to induce multi-grain topics. We argue that multi-grain models are more appropriate for our task since standard models tend to produce topics that correspond to global properties of objects (e.g., the brand of a product type) rather than the aspects of an object that tend to be rated by a user. The models we present not only extract ratable aspects, but also cluster them into coherent topics, e.g., 'waitress' and 'bartender' are part of the same topic 'staff' for restaurants. This differentiates it from much of the previous work which extracts aspects through term frequency analysis with minimal clustering. We evaluate the multi-grain models both qualitatively and quantitatively to show that they improve significantly upon standard topic models.

[1]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[2]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[3]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[4]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[5]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[6]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[7]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[9]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[10]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[11]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[12]  Ko Fujimura,et al.  The EigenRumor Algorithm for Ranking Blogs , 2005 .

[13]  Jackie Chi Kit Cheung,et al.  Multi-Document Summarization of Evaluative Text , 2013, EACL.

[14]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.

[15]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[16]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[17]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[18]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[19]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[20]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[22]  Giuseppe Carenini,et al.  Extracting knowledge from evaluative text , 2005, K-CAP '05.

[23]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[24]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[25]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[26]  Trevor Hastie,et al.  An exploration of sentiment summarization , 2003 .

[27]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[28]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[31]  Andrew McCallum,et al.  A Note on Topical N-grams , 2005 .

[32]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[33]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[36]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.