Opinion integration and summarization

As Web 2.0 applications become increasingly popular, more and more people express their opinions on the Web in various ways in real time. Such wide coverage of topics and abundance of users make the Web an extremely valuable source for mining people’s opinions about all kinds of topics. However, since the opinions are usually expressed as unstructured text scattered in different sources, it is still difficult for the users to digest all opinions relevant to a specific topic with the current technologies. This thesis focuses on the problem of opinion integration and summarization whose goal is to better support user digestion of huge amounts of opinions for an arbitrary topic. To systematically study this problem, we have identified three important dimensions of opinion analysis: separation of aspects (or subtopics) of opinions, understanding of sentiments, and assessment of quality of opinions. These dimensions form three key components in an integrated opinion summarization system. Accordingly, this thesis makes contributions in proposing novel and general computational techniques for three synergistic tasks: (1) integrating relevant opinions from all kinds of Web 2.0 sources and organizing them along different aspects of the topic which not only serves as a semantic grouping of opinions but also facilitates user navigation into the huge opinion space; (2) inferring the sentiments in the opinions with respect to different aspects and different opinion holders, so as to provide the users with a more detailed and informed multi-perspective view of the opinions; and (3) improving the prediction of opinion quality which critically decides the usefulness of the information extracted from the opinions. We focus on general and robust methods which require minimal human supervision so as to make the automated methods applicable to a wide range of topics and scalable to large amounts of opinions. This focus differentiates this thesis from work that is fine-tuned or welltrained for particular domains but are not easily adaptable to new domains. Our main idea is to exploit many naturally available resources, such as structured ontologies and social networks, which serve as indirect signals and guidance for generating opinion summaries. Along this line, our proposed techniques have been shown to be effective and general enough to be applied for potentially many interesting applications in multiple domains, such as business intelligence and political science.

[1]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[2]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[3]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[4]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[5]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[6]  E. Goffman Frame analysis: An essay on the organization of experience , 1974 .

[7]  Mitsuru Ishizuka,et al.  SentiFul: Generating a reliable lexicon for sentiment analysis , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[8]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[9]  W. Bruce Croft,et al.  An Evaluation of Techniques for Clustering Search Results , 2005 .

[10]  Giuseppe Carenini,et al.  Extracting knowledge from evaluative text , 2005, K-CAP '05.

[11]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[12]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[13]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[14]  Raymond H. Putra,et al.  Support or Oppose? Classifying Positions in Online Debates from Reply Activities and Opinion Expressions , 2010, COLING.

[15]  Susan T. Dumais,et al.  Bringing order to the Web: automatically categorizing search results , 2000, CHI.

[16]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[17]  Jong-Hyeok Lee,et al.  Improving Opinion Retrieval Based on Query-Specific Sentiment Lexicon , 2009, ECIR.

[18]  Carlos Castillo,et al.  Web spam identification through content and hyperlinks , 2008, AIRWeb '08.

[19]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[20]  ChengXiang Zhai,et al.  Automatic labeling of multinomial topic models , 2007, KDD '07.

[21]  Yue Lu,et al.  Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA , 2011, Information Retrieval.

[22]  Yue Lu,et al.  Rated aspect summarization of short comments , 2009, WWW '09.

[23]  Irwin King,et al.  Let's Tango - Finding the Right Couple for Feature-Opinion Association in Sentiment Analysis , 2009, PAKDD.

[24]  Dragomir R. Radev,et al.  Identifying Text Polarity Using Random Walks , 2010, ACL.

[25]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[26]  Yue Lu,et al.  Opinion integration through semi-supervised topic modeling , 2008, WWW.

[27]  Sasha Blair-Goldensohn,et al.  Sentiment Summarization: Evaluating and Learning User Preferences , 2009, EACL.

[28]  Ari Rappoport,et al.  RevRank: A Fully Unsupervised Algorithm for Selecting the Most Helpful Book Reviews , 2009, ICWSM.

[29]  Jon M. Kleinberg,et al.  WWW 2009 MADRID! Track: Data Mining / Session: Opinions How Opinions are Received by Online Communities: A Case Study on Amazon.com Helpfulness Votes , 2022 .

[30]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[31]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[32]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[33]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[34]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[35]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[36]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[37]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[38]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[39]  Saif Mohammad,et al.  Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus , 2009, EMNLP.

[40]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[41]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[42]  ChengXiang Zhai,et al.  A mixture model for contextual text mining , 2006, KDD '06.

[43]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[44]  Yue Lu,et al.  Exploiting Structured Ontology to Organize Scattered Online Opinions , 2010, COLING.

[45]  Roger Gabriel Glossary of technical terms , 1980 .

[46]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[47]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[48]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[49]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[50]  Combining Local and Global Resources for Constructing an Error-Minimized Opinion Word Dictionary , 2008, PRICAI.

[51]  Panagiotis G. Ipeirotis,et al.  Designing novel review ranking systems: predicting the usefulness and impact of reviews , 2007, ICEC.

[52]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[53]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[54]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[55]  Tong Zhang,et al.  Linear prediction models with graph regularization for web-page categorization , 2006, KDD '06.

[56]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[57]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[58]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[59]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[60]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[61]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[62]  Yue Lu,et al.  Exploiting social context for review quality prediction , 2010, WWW '10.