Opinion retrieval and classification in blogs

The blog documents are rich in opinionated contents. To incorporate the opinionated texts into document retrieval, this dissertation proposes an novel opinion retrieval model to effectively retrieve the blog documents having opinions about a given query topic, and label the opinion polarity of the retrieved documents as positive, negative or mixed. The proposed model consists of a query pre-processing module for concept recognition and query expansion. A fact-based information retrieval module retrieves the topic relevant documents, based on the concepts and individual query terms of the processed query, no matter if the documents contain opinions or not. An opinion identification module detects the opinionated texts, regardless of whether the opinions are related to the query. An opinion retrieval module finds the query-related opinions, which are then used with the factual retrieval scores to calculate the document-query opinion similarity. Finally an opinion polarity module gives each retrieved document a polarity label to indicate the overall tone of the query-related opinions in the document. The experimental results show that the retrieval effectiveness and the classification accuracy of this proposed model are both higher than other state-of-the-art systems.