Aggregation Methods for Proximity-Based Opinion Retrieval

The enormous amount of user-generated data available on the Web provides a great opportunity to understand, analyze, and exploit people’s opinions on different topics. Traditional Information Retrieval methods consider the relevance of documents to a topic but are unable to differentiate between subjective and objective documents. Opinion retrieval is a retrieval task in which not only the relevance of a document to the topic is important but also the amount of opinion expressed in the document about the topic. In this article, we address the blog post opinion retrieval task and propose methods that rank blog posts according to their relevance and opinionatedness toward a topic. We propose estimating the opinion density at each position in a document using a general opinion lexicon and kernel density functions. We propose and investigate different models for aggregating the opinion density at query terms positions to estimate the opinion score of every document. We then combine the opinion score with the relevance score based on a probabilistic justification. Experimental results on the BLOG06 dataset show that the proposed method provides significant improvement over the standard TREC baselines. The proposed models also achieve much higher performance compared to all state of the art methods.

[1]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[2]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[3]  Jung-Tae Lee,et al.  High precision opinion retrieval using sentiment-relevance flows , 2010, SIGIR '10.

[4]  Mark Sanderson,et al.  Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[5]  Valentin Jijkoun,et al.  Generating Focused Topic-Specific Sentiment Lexicons , 2010, ACL.

[6]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[7]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[8]  CrestaniFabio,et al.  Aggregation Methods for Proximity-Based Opinion Retrieval , 2012 .

[9]  Olga Vechtomova Facet-based opinion retrieval from blogs , 2010, Inf. Process. Manag..

[10]  Iadh Ounis,et al.  The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection , 2006 .

[11]  Koji Eguchi,et al.  Sentiment Retrieval using Generative Models , 2006, EMNLP.

[12]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[13]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[14]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[15]  Jungi Kim,et al.  KLE at TREC 2008 Blog Track: Blog Post and Feed Retrieval , 2008, TREC.

[16]  Jong-Hyeok Lee,et al.  Improving Opinion Retrieval Based on Query-Specific Sentiment Lexicon , 2009, ECIR.

[17]  Iadh Ounis,et al.  Overview of the TREC 2008 Blog Track , 2008, TREC.

[18]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[19]  Craig MacDonald,et al.  Limits of opinion-finding baseline systems , 2008, SIGIR '08.

[20]  Giorgio Gambosi,et al.  Automatic Construction of an Opinion-Term Vocabulary for Ad Hoc Retrieval , 2008, ECIR.

[21]  M. O'Hagan,et al.  Aggregating Template Or Rule Antecedents In Real-time Expert Systems With Fuzzy Set Logic , 1988, Twenty-Second Asilomar Conference on Signals, Systems and Computers.

[22]  Fabio Crestani,et al.  Proximity-based opinion retrieval , 2010, SIGIR '10.

[23]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[24]  Craig MacDonald,et al.  University of Glasgow at TREC 2008: Experiments in Blog, Enterprise, and Relevance Feedback Tracks with Terrier , 2008, TREC.

[25]  Xuanjing Huang,et al.  A unified relevance model for opinion retrieval , 2009, CIKM.

[26]  Kiduk Yang WIDIT in TREC 2008 Blog Track: Leveraging Multiple Sources of Opinion Evidence , 2008, TREC.

[27]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decision-making , 1988 .

[28]  Iadh Ounis,et al.  Limits of Opinion-Finding Baseline Systems | NIST , 2008, SIGIR 2008.

[29]  Craig MacDonald,et al.  Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval , 2009, ECIR.

[30]  Min Zhang,et al.  A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval , 2008, SIGIR '08.

[31]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[32]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[33]  Kamal Nigam,et al.  Retrieving topical sentiments from online document collections , 2003, IS&T/SPIE Electronic Imaging.

[34]  Clement Yu,et al.  UIC at TREC 2008 Blog Track , 2008 .

[35]  Craig MacDonald,et al.  Overview of the TREC 2007 Blog Track , 2007, TREC.

[36]  Wei Zhang,et al.  Opinion retrieval from blogs , 2007, CIKM '07.

[37]  Craig MacDonald,et al.  An effective statistical approach to blog post opinion retrieval , 2008, CIKM '08.

[38]  Mostafa Keikha,et al.  Aggregating multiple opinion evidence in proximity-based opinion retrieval , 2011, SIGIR '11.

[39]  Craig MacDonald,et al.  On the TREC Blog Track , 2021, ICWSM.

[40]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[41]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[42]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.