Efficient Opinion Summarization on Comments with Online-LDA

Customer reviews and comments on web pages are important information n our daily life. For example, we prefer to choose a hotel with positive comments rom previous customers. As the huge amounts of such information demonstrate the haracteristics of big data, it places heavy burdens on the assimilation of the customercontributed pinions. To overcoming this problem, we study an efficient opinion ummarization approach for a set of massive user reviews and comments associated ith an online resource, to summarize the opinions into two categories, i.e., positive nd negative. In this paper, we proposed a framework including: (1) overcoming the ig data problem of online comments using the efficient online-LDA approach; (2) electing meaningful topics from the imbalanced data; (3) summarizing the opinion f comments with high precision and recall. This framework is different from much f the previous work in that the topics are pre-defined and selected the topics for etter opinion summarization. To evaluate the proposed framework, we perform the xperiments on a dataset of hotel reviews for the variety of topics contained. The esults show that our framework can gain a significant performance improvement on pinion summarization.

[1]  Yue Lu,et al.  Rated aspect summarization of short comments , 2009, WWW '09.

[2]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[3]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[4]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[5]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[6]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[7]  Gao Cong,et al.  Topic-driven reader comments summarization , 2012, CIKM.

[8]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[9]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[10]  Yue Lu,et al.  Latent aspect rating analysis without aspect keyword supervision , 2011, KDD.

[11]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[12]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[13]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[14]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[15]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[16]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[19]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[20]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[21]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[22]  Monte Carlo Integration Markov Chain Monte Carlo and Gibbs Sampling , 2002 .