WIDIT in TREC 2006 Blog Track

Web Information Discovery Integrated Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science participated in the Blog track’s opinion task in TREC2006. The goal of opinion task is to "uncover the public sentiment towards a given entity/target", which involves not only retrieving topically relevant blogs but also identifying those that contain opinions about the target. To further complicate the matter, the blog test collection contains considerable amount of noise, such as blogs with non-English content and non-blog content (e.g., advertisement, navigational text), which may misdirect retrieval systems. Based on our hypothesis that noise reduction (e.g., exclusion of non-English blogs, navigational text) will improve both on-topic and opinion retrieval performances, we explored various noise reduction approaches that can effectively eliminate the noise in blog data without inadvertently excluding valid content. After creating two separate indexes (with and without noise) to assess the noise reduction effect, we tackled the opinion blog retrieval task by breaking it down to two sequential subtasks: on-topic retrieval followed by opinion classification. Our opinion retrieval approach was to first apply traditional IR methods to retrieve on-topic blogs, and then boost the ranks of opinionated blogs based on opinion scores generated by opinion assessment methods. Our opinion module consists of Opinion Term Module, which identify opinions based on the frequency of opinion terms (i.e., terms that only occur frequently in opinion blogs), Rare Term Module, which uses uncommon/rare terms (e.g., “sooo good”) for opinion classification, IU Module, which uses IU (I and you) collocations, and Adjective-Verb Module, which uses computational linguistics’ distribution similarity approach to learn the subjective language from training data.

[1]  Paul Thompson,et al.  A combination of expert opinion approach to probabilistic information retrieval, part 1: The conceptual model , 1990, Inf. Process. Manag..

[2]  Timothy Chklovski,et al.  Deriving quantitative overviews of free text assessments on the web , 2006, IUI '06.

[3]  Kiduk Yang Combining Text- and Link-based Retrieval Methods for Web IR , 2001, TREC.

[4]  Chris Buckley,et al.  Using Query Zoning and Correlation Within SMART: TREC 5 , 1996, TREC.

[5]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[6]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[7]  Gilad Mishne,et al.  Deriving wishlists from blogs show us your blog, and we'll tell you what books to buy , 2006, WWW '06.

[8]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[9]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[10]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[11]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[12]  Jacques Savoy,et al.  Report on the TREC-8 Experiment: Searching on the Web and in Distributed Collections , 1999, TREC.

[13]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[14]  Ning Yu,et al.  WIDIT: Fusion-Based Approach to Web Search Optimization , 2005, AIRS.

[15]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[16]  Miles Efron The liberal media and right-wing conspiracies: using cocitation information to estimate political orientation in web documents , 2004, CIKM.

[17]  Inna Kouper,et al.  Conversations in the Blogosphere: An Analysis "From the Bottom Up" , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[18]  David R. Pierce,et al.  Identifying Opinionated Sentences , 2003, NAACL.

[19]  Ning Yu,et al.  WIDIT in TREC 2004 Genomics, Hard, Robust and Web Tracks , 2004, TREC.

[20]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[21]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[22]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.