Constrained LDA for Grouping Product Features in Opinion Mining

In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features. However, for the same feature, people can express it with different words and phrases. To produce an effective summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature. Topic modeling is a suitable method for the task. However, instead of simply letting topic modeling find groupings freely, we believe it is possible to do better by giving it some pre-existing knowledge in the form of automatically extracted constraints. In this paper, we first extend a popular topic modeling method, called Latent Dirichlet Allocation (LDA), with the ability to process large scale constraints. Then, two novel methods are proposed to extract two types of constraints automatically. Finally, the resulting constrained-LDA and the extracted constraints are applied to group product features. Experiments show that constrained-LDA outperforms the original LDA and the latest mLSA by a large margin.

[1]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[2]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[3]  Susan Tiefenbrun,et al.  SAN JOSE (California) , 2012 .

[4]  Eduard Hovy,et al.  Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text , 2006 .

[5]  Claire Cardie,et al.  Topic Identification for Fine-Grained Opinion Analysis , 2008, COLING.

[6]  Regina Barzilay,et al.  Learning Document-Level Semantic Properties from Free-Text Annotations , 2008, ACL.

[7]  Rohini K. Srihari,et al.  OpinionMiner: a novel machine learning system for web opinion mining and extraction , 2009, KDD.

[8]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[9]  Hua Xu,et al.  Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints , 2010, COLING.

[10]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[11]  Yuji Matsumoto,et al.  Extracting Aspect-Evaluation and Aspect-Of Relations in Opinion Mining , 2007, EMNLP.

[12]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[13]  Thomas L. Griffiths,et al.  Prediction and Semantic Association , 2002, NIPS.

[14]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[15]  Hsin-Hsi Chen,et al.  Opinion mining and relationship discovery using CopeOpi opinion analysis system , 2009 .

[16]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[17]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[18]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[19]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[20]  Eric K. Clemons,et al.  When Online Reviews Meet Hyperdifferentiation: A Study of Craft Beer Industry , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[21]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[22]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[23]  Andrew McCallum,et al.  Polylingual Topic Models , 2009, EMNLP.

[24]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[25]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[26]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[27]  Xiaojin Zhu,et al.  Latent Dirichlet Allocation with Topic-in-Set Knowledge , 2009, HLT-NAACL 2009.

[28]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[29]  Padhraic Smyth,et al.  Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning , 2008, SEMWEB.

[30]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[31]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[32]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[35]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[36]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[37]  Claire Cardie,et al.  Noun Phrase Coreference as Clustering , 1999, EMNLP.

[38]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Giuseppe Carenini,et al.  Extracting knowledge from evaluative text , 2005, K-CAP '05.

[40]  Zhong Su,et al.  Product feature categorization with multilevel latent semantic association , 2009, CIKM.

[41]  E. Clemons,et al.  When Online Reviews Meet Hyperdifferentiation: A Study of the Craft Beer Industry , 2006 .

[42]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.