Subjectivity Detection in English and Bengali: A CRF-based Approach

With the proliferation of online reviews and sentiments the Web is becoming more and more useful and important information resource for people. As a result, automatic opinion/sentiment mining has become a hot research topic recently. Extracting opinions from text is a hard semantic problem. Subjectivity Detection is studied as a text classification problem that classifies texts as either subjective or objective. This paper illustrates a Conditional Random Field (CRF) based Subjectivity Detection approach tested on English and Bengali multiple domain corpus to establish its effectiveness over multiple domain perspective. The motivation is to develop generic domain independent solution architecture for a less computerized language like Bengali. A relatively simple and less human interactive technique has been proposed for developing opinion mining resources for Bengali. The features used in the CRF-based classifier could be extracted for any new language with minimum linguistics knowledge. The final classifier has resulted precision values of 76.08% and 79.90% for English and 72.16% and 74.6% for Bengali for the news and blog domains respectively.