Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval

Opinion finding is a challenging retrieval task, where it has been shown that it is especially difficult to improve over a strongly performing topic-relevance baseline. In this paper, we propose a novel approach for opinion finding, which takes into account the proximity of query terms to subjective sentences in a document. We adapt two state-of-the-art opinion detection techniques to identify subjective sentences from the retrieved documents. Our first technique uses the OpinionFinder toolkit to classify the subjectiveness of sentences in a document. Our second technique uses an automatically generated dictionary of subjective terms derived from the document collection itself to identify the most subjective sentences in a document. We extend the Divergence From Randomness (DFR) proximity model to integrate the proximity of query terms to the subjective sentences identified by either of the proposed techniques. We evaluate these techniques on five different strong baselines across two different query datasets from the TREC Blog track. We show that we can significantly improve over the baselines and that, in several settings, our proposed techniques can at least match the top performing systems at the TREC Blog track.

[1]  Rohini K. Srihari,et al.  Biterm language models for document retrieval , 2002, SIGIR '02.

[2]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[3]  Iadh Ounis,et al.  University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier , 2004, TREC.

[4]  Trevor Darrell,et al.  MULTIMODAL INTERFACES THAT Flex, Adapt, and Persist , 2004 .

[5]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[6]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[7]  Iadh Ounis,et al.  A case study of distributed information retrieval architectures to index one terabyte of text , 2005, Inf. Process. Manag..

[8]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[9]  Iadh Ounis,et al.  University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier , 2006, TREC.

[10]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[11]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[12]  Iadh Ounis,et al.  The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection , 2006 .

[13]  C. Porco From the blogosphere , 2007, Nature.

[14]  Coskun Bayrak,et al.  Topic Categorization for Relevancy and Opinion Detection , 2007, TREC.

[15]  Olga Vechtomova Using Subjective Adjectives in Opinion Retrieval from Blogs , 2007, TREC.

[16]  Iadh Ounis,et al.  Incorporating term dependency in the dfr framework , 2007, SIGIR.

[17]  Craig MacDonald,et al.  Overview of the TREC 2007 Blog Track , 2007, TREC.

[18]  Craig MacDonald,et al.  On the TREC Blog Track , 2021, ICWSM.

[19]  Craig MacDonald,et al.  Limits of opinion-finding baseline systems , 2008, SIGIR '08.

[20]  Iadh Ounis,et al.  Overview of the TREC 2008 Blog Track , 2008, TREC.

[21]  Craig MacDonald,et al.  Ranking opinionated blog posts using OpinionFinder , 2008, SIGIR '08.

[22]  Giorgio Gambosi,et al.  Automatic Construction of an Opinion-Term Vocabulary for Ad Hoc Retrieval , 2008, ECIR.

[23]  Craig MacDonald,et al.  An effective statistical approach to blog post opinion retrieval , 2008, CIKM '08.

[24]  Iadh Ounis,et al.  Limits of Opinion-Finding Baseline Systems | NIST , 2008, SIGIR 2008.

[25]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[26]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .