We developed a two-step approach that finds relevant blog documents containing opinioned content for a given query topic. The first step, retrieval step, is to find documents relevant to the query. The second step, opinion identification step, is to find the documents containing opinions within the scope of the document set from the retrieval step. In the retrieval step, we try to improve the retrieval effectiveness by retrieving based on concepts, and doing query expansion using pseudo feedback, Wikipedia feedback and web feedback. In the opinion identification step, we train a sentence classifier using subjective sentences (opinioned) and objective sentences (non-opinioned), which are relevant to a query topic. This classifier labels each sentence in a given document as either subjective or objective. A document containing subjective sentences relating to the query is finally labeled as an opinioned relevant document (ORD). We tried two strategies to rank the ORDs that became two submitted runs.
[1]
Bo Pang,et al.
Thumbs up? Sentiment Classification using Machine Learning Techniques
,
2002,
EMNLP.
[2]
H. Chernoff,et al.
The Use of Maximum Likelihood Estimates in {\chi^2} Tests for Goodness of Fit
,
1954
.
[4]
Thorsten Joachims,et al.
Text Categorization with Support Vector Machines: Learning with Many Relevant Features
,
1998,
ECML.
[5]
Wei Zhang,et al.
Recognition and classification of noun phrases in queries for effective retrieval
,
2007,
CIKM '07.
[6]
Clement T. Yu,et al.
UIC at TREC 2005: Robust Track
,
2005,
TREC.
[7]
Clement T. Yu,et al.
An effective approach to document retrieval via utilizing WordNet and recognizing phrases
,
2004,
SIGIR '04.