Snippet-Based Unsupervised Approach for Sentiment Classification of Chinese Online Reviews

Sentiment classification seeks to identify general attitude of a piece of text of comments or reviews on certain subject, be it positive or negative. Most existing researches on sentiment classification employ supervised learning approaches that rely on annotated data. However, sentiment is expressed differently on different subjects in different domains, and having annotated corpora for every domain of interest is not always practical. This paper proposes an unsupervised learning approach for classifying text of online reviews as recommended or not recommended. The proposed method is based on search engine snippet, summary information on the result page of a search engine. A basic assumption is that terms with similar orientation tend to co-occur. The co-occurrence is measured by utilizing snippets returned from search engines, with a query consisting of the text and a seed positive or negative word. With the information of snippets, the proposed method may estimate the association of candidate terms more accurately. This allows us to reliably predict the sentiment orientation of customer reviews. Texts of customer reviews are then classified as recommended or not recommended if the average sentiment orientations of its phrases are positive or negative. The research data set of this study consists of 600 Chinese online reviews about travel destinations retrieved from Ctrip.com. Our approach achieves an accuracy of 76.5%. Factors that influence the accuracy of the sentiment classification of Chinese online reviews were discussed.

[1]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[2]  Christopher S. G. Khoo,et al.  Filtering product reviews from web search results , 2007, DocEng '07.

[3]  Alastair M. Morrison,et al.  Destination image representation on the web : Content analysis of Macau travel related websites , 2007 .

[4]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[5]  John Carroll,et al.  Unsupervised Classification of Sentiment and Objectivity in Chinese Text , 2008, IJCNLP.

[6]  Li Yi-jun,et al.  Sentiment classification for Chinese product reviews using an unsupervised Internet-based method , 2008, 2008 International Conference on Management Science and Engineering 15th Annual Conference Proceedings.

[7]  Gregory Grefenstette,et al.  Validating the Coverage of Lexical Resources for Affect Analysis and Automatically Classifying New Words along Semantic Axes , 2006, Computing Attitude and Affect in Text.

[8]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[9]  J. Crotts,et al.  Travel Blogs and the Implications for Destination Marketing , 2007 .

[10]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[11]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[12]  Wen Shi,et al.  Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[13]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[14]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.