Using Naïve Bayes Classifier to Distinguish Reviews from Non-review Documents in Chinese

Reviews are subjective documents expressing opinions or evaluations. In contrast, non-review documents often present factual information objectively. Separating reviews from non-reviews, or subjectivity classification, is potentially important for many text processing applications, such as information extraction and information retrieval. Also, it is a key process in sentiment classification for online customer reviews. As a type of genre classification, the classifications of subjective and objective texts are different from traditional topic-based classifications. Not many studies have been conducted in this domain and most of them were on English texts. Little work has been done on Chinese subjectivity classification. However, the detailed techniques used in English texts can not be applied directly to Chinese due to the different characteristics between these two languages. This paper proposes an approach to perform subjectivity classification on Chinese text based on a supervised machine learning algorithm, Naive Bayes. Experiment studies have been conducted on two kinds of documents: movie reviews and movie plots written in Chinese. The results show that the performances of the proposed approach are comparable to those of the existing English subjectivity classification studies.

[1]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[2]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[3]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[4]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[5]  Wen Shi,et al.  Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[6]  Andreas Rauber,et al.  Integrating automatic genre analysis into digital libraries , 2001, JCDL '01.

[7]  Lina Zhou,et al.  Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[8]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[9]  Eduard Hovy,et al.  Generating Natural Language Under Pragmatic Constraints , 1988 .

[10]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[11]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[12]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[13]  Syin Chan,et al.  Sentiment-based search in digital libraries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[14]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[15]  Jian Liu,et al.  Using Bilingual Lexicon to Judge Sentiment Orientation of Chinese Words , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[16]  Janyce Wiebe,et al.  A Corpus Study of Evaluative and Speculative Language , 2001, SIGDIAL Workshop.