A new generative opinion retrieval model integrating multiple ranking factors

In this paper, we present clear and formal definitions of ranking factors that should be concerned in opinion retrieval and propose a new opinion retrieval model which simultaneously combines the factors from the generative modeling perspective. The proposed model formally unifies relevance-based ranking with subjectivity detection at the document level by taking multiple ranking factors into consideration: topical relevance, subjectivity strength, and opinion-topic relatedness. The topical relevance measures how strongly a document relates to a given topic, and the subjectivity strength indicates the likelihood that the document contains subjective information. The opinion-topic relatedness reflects whether the subjective information is expressed with respect to the topic of interest. We also present the universality of our model by introducing the model’s derivations that represent other existing opinion retrieval approaches. Experimental results on a large-scale blog retrieval test collection demonstrate that not only are the individual ranking factors necessary in opinion retrieval but they cooperate advantageously to produce a better document ranking when used together. The retrieval performance of the proposed model is comparable to that of previous systems in the literature.

[1]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[2]  Yue Liu,et al.  Combining Language Model with Sentiment Analysis for Opinion Retrieval of Blog-Post , 2006, TREC.

[3]  Dietrich Klakow,et al.  A Combined Query Expansion Technique for Retrieving Opinions from Blogs , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[4]  Alan F. Smeaton,et al.  DCU at the TREC 2008 Blog Track , 2008, TREC.

[5]  Xiangji Huang,et al.  York University at TREC 2008: Blog Track , 2008, TREC.

[6]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[7]  Iadh Ounis,et al.  University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier , 2006, TREC.

[8]  Iadh Ounis,et al.  The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection , 2006 .

[9]  Luo Si,et al.  Knowledge Transfer and Opinion Detection in the TREC2006 Blog Track , 2006 .

[10]  Craig MacDonald,et al.  Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval , 2009, ECIR.

[11]  Min Zhang,et al.  A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval , 2008, SIGIR '08.

[12]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[13]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[14]  Olga Vechtomova Using Subjective Adjectives in Opinion Retrieval from Blogs , 2007, TREC.

[15]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[16]  Soo-Min Kim,et al.  Automatic Detection of Opinion Bearing Words and Sentences , 2005, IJCNLP.

[17]  Coskun Bayrak,et al.  Topic Categorization for Relevancy and Opinion Detection , 2007, TREC.

[18]  van Gerardus Noord,et al.  Special issue: finite state methods in natural language processing , 2003 .

[19]  Coskun Bayrak,et al.  UALR at TREC: Blog Track , 2006, TREC.

[20]  Clement Yu,et al.  UIC at TREC 2008 Blog Track , 2008 .

[21]  Sudeshna Sarkar,et al.  IIT Kharagpur at TREC 2008 Blog Track , 2008, TREC.

[22]  Craig MacDonald,et al.  University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier , 2007, TREC.

[23]  Yi Zhang,et al.  UCSC on REC 2006 Blog Opinion Mining , 2006, TREC.

[24]  Linh Hoang,et al.  A Hybrid Method for Opinion finding Task (KUNLP at TREC 2008 Blog Track) , 2008, TREC.

[25]  Bin Li,et al.  UTDallas at TREC 2008 Blog Track , 2008, TREC.

[26]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[27]  Gilad Mishne Multiple Ranking Strategies for Opinion Retrieval in Blogs - The University of Amsterdam at the 2006 TREC Blog Track , 2006, TREC.

[28]  Hwee Tou Ng,et al.  A 2-poisson model for probabilistic coreference of named entities for improved text retrieval , 2009, SIGIR.

[29]  Wei Zhang,et al.  UIC at TREC 2006 Blog Track , 2006, TREC.

[30]  Jimmy J. Lin,et al.  TREC 2006 at Maryland: Blog, Enterprise, Legal and QA Tracks , 2006, TREC.

[31]  Stuart Watt,et al.  RGU at the TREC Blog Track , 2006 .

[32]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[33]  Craig MacDonald,et al.  Overview of the TREC 2007 Blog Track , 2007, TREC.

[34]  Hui Zhang,et al.  WIDIT in TREC 2006 Blog Track , 2006, TREC.