FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track

Abstract : We take part in the opinion and polarity retrieval tasks of the blog track. A test collection, called Blog06, was created for the blog track in 2006 with three main different components: feeds, permalinks and home-pages. The collection contains spam as well as possibly no blogs and no english pages. For our experimentation only permalinks have been taken into consideration, consisting of 3.2 million of Web pages for a total of 88.8GB, each one containing a post and its related comments. The evaluation metrics are precision/recall based, the Mean Average Precision (MAP) and R-Precision (RPrec), but we also focused on Precision at 10 (P@10), due to its relevance in evaluating the effectiveness of Web search engines. As in 2007, we based our approach on the construction of ad-hoc weighted dictionaries, containing terms assumed to be used to express a sentiment. The weight is a measure of how much sentiment the term expresses. To automatically construct our dictionaries, we assumed that opinion-bearing words distribute more randomly in the set of opinionated documents than semantic-bearing terms, but less randomly than not-informative terms.