IIT Kharagpur at TREC 2008 Blog Track

This paper describes our opinion retrieval system for TREC 2008 blog track. We focused on ve dierent aspects of the system. The rst module is focussed on extracting the blog content out from junk html and thereby decreasing the noise in the indexed content. The second module aims at removing various kind of spam content from real blogs. The third module aimed at retrieving the relevant documents. The fourth module lters out opinionated documents and the fth one calculated the polarity of the sentiments in the document. The nal ranked retrieval runs were based on various combination of settings in each module so as to study the eect