Extracting opinion targets from user-generated discourse with an application to recommendation systems

With the growing popularity of online shopping, most e-commerce websites nowadays offer their customers to leave feedback about their purchases. This form of customer or user interaction is also very popular among Web 2.0 websites. Online databases, e.g. of movies, offer their users incentives to participate in the content creation by giving them the opportunity to rate films and write reviews about them. Complete websites, e.g. rateitall.com, have emerged, which allow their users to rate and review virtually anything they care about. As more and more content is created and aggregated on these websites, a strong demand for automatic approaches which are capable of extracting structured information from mostly unstructured text has emerged. An automatic extraction of the opinions uttered in the thousands of user-generated texts can provide interesting data for several other tasks such as question answering, information retrieval and summarization. All of these tasks require an opinion mining system, which analyzes the individual elements of an opinion on a sentence level, i.e. the terms which express the opinion, their polarity, and what the opinion is about. In this thesis, we present a comprehensive study of the automatic extraction of opinions with a focus on opinion targets, which is an essential step in order to enable other tasks, e.g. information retrieval or question answering on opinionated content. We analyze the state-of-the-art in opinion mining and divide it into three subtasks, one of which is the extraction of opinion targets. We perform a comparative evaluation of two unsupervised algorithms in the task of opinion target extraction on datasets of customer reviews and blog postings which span the following four different domains: digital cameras, cars, movies and web-services. We show how the identification of opinion expressions influences the opinion target extraction performance of each algorithm. We also show that a simple word distance-based heuristic significantly outperforms both unsupervised algorithms, which make their relevance decision by analyzing word frequencies in the corpus. The word distance-based heuristic reaches an F-Measure between 0.372 and 0.491 on the four datasets. We furthermore evaluate a state-of-the-art supervised algorithm in the task of opinion target extraction and present a new approach which is based on Conditional Random Fields (CRF). Our approach outperforms the state-of-the-art baseline significantly on all four datasets reaching an F-Measure between 0.497 and 0.702. We also evaluate both algorithms in a cross-domain opinion target extraction task, since a common problem with supervised algorithms is the domain dependence of the learned model. In this setting, our CRF-based approach also outperforms the baseline on all four datasets and it outperforms the best unsupervised approach, which is by design not prone to domain dependence, on three of the four datasets mentioned above. In the cross-domain opinion target extraction task, the CRF-based approach reaches an F-Measure between 0.360 and 0.518 on the four datasets. The extraction of opinion targets, which are referenced by anaphoric expressions, is a challenge which is frequently encountered in opinion mining at the phrase level. For the first time, we integrate anaphora resolution algorithms in a supervised opinion mining system. We perform a comparative evaluation of two algorithms, in which we require them to extract the correct antecedent of anaphoric targets. Our results indicate that one of the algorithms, which was designed for high-precision anaphora resolution, is better suited in the opinion mining setting. By extending the algorithm, which yields the best results in the off-the-shelf configuration, we yield significant improvements regarding the extraction of opinion targets on three of the four datasets. Finally, we show how an opinion mining system can be successfully employed to improve another application. Recommendation systems are nowadays widely used in online platforms and desktop applications in order to suggest goods or pieces of art to users, which they do not know yet, but are likely to enjoy. The recommendations for a user U1 are determined by first profiling the taste and interests of all users of the recommendation system. Then the algorithm identifies other users U2 ... Un which have a similar taste as user U1, and then recommends items to U1 which the users who have a similar taste enjoyed. A user's taste and interests are typically profiled by giving him the option to rate entities, which he has consumed. As mentioned above, website operators have also given users the opportunity to leave their ratings not only on a numerical scale, but also via a free-text review. We hypothesize that these free-text reviews contain a lot of information, expressed in the users' opinions, which would allow us to model his taste and preferences on a very fine granularity. We show that, by integrating our opinion mining system as a feature provider to a state-of-the-art recommendation system, we can significantly improve the accuracy of the recommendations, which we evaluate on a dataset of movie ratings and reviews.

[1]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[2]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[3]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[4]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[5]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[6]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[7]  Sasha Blair-Goldensohn,et al.  Building a Sentiment Summarizer for Local Service Reviews , 2008 .

[8]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[9]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[10]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[11]  Dan Klein,et al.  Unsupervised Coreference Resolution in a Nonparametric Bayesian Model , 2007, ACL.

[12]  G. Takács,et al.  On the Gravity Recommendation System , 2007 .

[13]  Hiroaki Sato,et al.  The FrameNet Data and Software , 2003, ACL.

[14]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[15]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[16]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[17]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[18]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[19]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[20]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[21]  Iryna Gurevych,et al.  Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations , 2009, TSA@CIKM.

[22]  Andrew McCallum,et al.  Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..

[23]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[24]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[25]  Rohini K. Srihari,et al.  Using Verbs and Adjectives to Automatically Classify Blog Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[26]  Volker Tresp,et al.  Relation Prediction in Multi-Relational Domains using Matrix Factorization , 2008 .

[27]  Udo Hahn,et al.  Finding new terminology in very large corpora , 2005, K-CAP '05.

[28]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[29]  Massimo Poesio,et al.  A General-Purpose, Off-the-shelf Anaphora Resolution Module: Implementation and Preliminary Evaluation , 2004, LREC.

[30]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[31]  Iryna Gurevych,et al.  Document Level Subjectivity Classification Experiments in DEFT'09 Challenge , 2009 .

[32]  Richard Evans,et al.  A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method , 2002, CICLing.

[33]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[34]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[35]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[36]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[37]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[38]  Iryna Gurevych,et al.  Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews , 2010, ACL.

[39]  Panagiotis G. Ipeirotis,et al.  Designing novel review ranking systems: predicting the usefulness and impact of reviews , 2007, ICEC.

[40]  Kentaro Inui,et al.  Collecting Evaluative Expressions for Opinion Extraction , 2004, IJCNLP.

[41]  Josef Steinberger,et al.  Improving LSA-based Summarization with Anaphora Resolution , 2005, HLT.

[42]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[43]  Kamal Nigam,et al.  Retrieving topical sentiments from online document collections , 2003, IS&T/SPIE Electronic Imaging.

[44]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[45]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[46]  Nicolas Nicolov,et al.  Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations , 2009, ICWSM.

[47]  Chengqing Zong,et al.  Multi-domain Sentiment Classification , 2008, ACL.

[48]  Paul Over,et al.  Intrinsic Evaluation of Generic News Text Summarization Systems , 2003 .

[49]  Iryna Gurevych,et al.  Sentence Level Subjectivity and Sentiment Analysis Experiments in NTCIR-7 MOAT Challenge , 2008, NTCIR.

[50]  Iryna Gurevych,et al.  A Comparative Study of Feature Extraction Algorithms in Customer Reviews , 2008, 2008 IEEE International Conference on Semantic Computing.

[51]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[52]  Marcus Herzog,et al.  Using Ontologies for Extracting Product Features from Web Pages , 2006, SEMWEB.

[53]  Arun Sundararajan,et al.  Opinion Mining using Econometrics: A Case Study on Reputation Systems , 2007, ACL.

[54]  Dan Jurafsky,et al.  Automatic Extraction of Opinion Propositions and their Holders , 2004 .

[55]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[56]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[57]  Angela Fahrni,et al.  Old Wine or Warm Beer : Target-Specific Sentiment Analysis of Adjectives , .

[58]  Mila Ramos-Santacruz,et al.  REES: A Large-Scale Relation and Event Extraction System , 2000, ANLP.

[59]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[60]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[61]  Guy Lapalme,et al.  Query-Based Summarization of Customer Reviews , 2007, Canadian Conference on AI.

[62]  Claire Cardie,et al.  Topic Identification for Fine-Grained Opinion Analysis , 2008, COLING.

[63]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[64]  Philip S. Yu,et al.  Unsupervised learning on k-partite graphs , 2006, KDD '06.

[65]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[66]  Iryna Gurevych,et al.  Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields , 2010, EMNLP.

[67]  José L. Vicedo,et al.  Applying Anaphora Resolution to Question Answering and Information Retrieval Systems , 2000 .

[68]  Iryna Gurevych,et al.  LRTwiki: Enriching the Likelihood Ratio Test with Encyclopedic Information for the Extraction of Relevant Terms , 2009 .

[69]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[70]  Giuseppe Carenini,et al.  Extracting knowledge from evaluative text , 2005, K-CAP '05.

[71]  Miriam Eckert,et al.  The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain , 2010 .

[72]  Feiyu Xu,et al.  Fine-grained Opinion Topic and Polarity Identification , 2008, LREC.

[73]  Jeanette K. Gundel,et al.  Cognitive Status and the form of Referring Expressions in Discourse , 1993, The Oxford Handbook of Reference.

[74]  Yuji Matsumoto,et al.  Opinion Mining as Extraction of Attribute-Value Relations , 2005, JSAI Workshops.

[75]  Micha Elsner,et al.  EM Works for Pronoun Anaphora Resolution , 2009, EACL.

[76]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[77]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[78]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[79]  Swapna Somasundaran,et al.  Discourse Level Opinion Relations: An Annotation Study , 2008, SIGDIAL Workshop.

[80]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[81]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[82]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[83]  R. Wilcox Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy , 2001 .

[84]  Janyce Wiebe,et al.  Recognizing subjectivity: a case study in manual tagging , 1999, Natural Language Engineering.

[85]  Claire Cardie,et al.  Joint Extraction of Entities and Relations for Opinion Recognition , 2006, EMNLP.

[86]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[87]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[88]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[89]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[90]  Valentin Jijkoun,et al.  Generating Focused Topic-Specific Sentiment Lexicons , 2010, ACL.

[91]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[92]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[93]  Hsin-Hsi Chen,et al.  Overview of Opinion Analysis Pilot Task at NTCIR-6 , 2007, NTCIR.

[94]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[95]  Breck Baldwin,et al.  CogNIAC: high precision coreference with limited knowledge and linguistic resources , 1997 .

[96]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[97]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[98]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[99]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[100]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[101]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[102]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[103]  Iryna Gurevych,et al.  Sentence and Expression Level Annotation of Opinions in User-Generated Discourse , 2010, ACL.

[104]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[105]  Eduard Hovy,et al.  Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text , 2006 .

[106]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[107]  Claire Cardie,et al.  Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification , 2009, EMNLP.

[108]  John K. Debenham,et al.  Informed Recommender: Basing Recommendations on Consumer Product Reviews , 2007, IEEE Intelligent Systems.

[109]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[110]  Christian Jacquemin,et al.  Term Extraction + Term Clustering: An Integrated Platform for Computer-Aided Terminology , 1999, EACL.

[111]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[112]  Richard Johansson,et al.  Syntactic and Semantic Structure for Opinion Expression Detection , 2010, CoNLL.

[113]  Koji Eguchi,et al.  Sentiment Retrieval using Generative Models , 2006, EMNLP.

[114]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[115]  Yue Lu,et al.  Opinion integration through semi-supervised topic modeling , 2008, WWW.

[116]  Hiroshi Kanayama,et al.  Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis , 2006, EMNLP.

[117]  Janyce Wiebe,et al.  Learning to Disambiguate Potentially Subjective Expressions , 2002, CoNLL.

[118]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[119]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[120]  Shlomo Argamon,et al.  Extracting Appraisal Expressions , 2007, NAACL.

[121]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[122]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[123]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[124]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[125]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[126]  Qiang Yang,et al.  Exploring in the weblog space by detecting informative and affective articles , 2007, WWW '07.

[127]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[128]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.

[129]  Dan Frankowski,et al.  Collaborative Filtering Recommender Systems , 2007, The Adaptive Web.