Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations

The application of sentiment analysis, also known as opinion mining, is more difficult in Chinese than in Indo-European languages, due to the compounding nature of Chinese words and phrases, and relatively lack of reliable resources in Chinese. This study used seed words, Chinese morphemes, which are mono-syllabic characters that function as individual words or be combined to create Chinese words and phrases, to classify movie reviews found on Yahoo! Taiwan. We utilized higher Pointwise Mutual Information (PMI) collocations, which consist of selected morpheme-level compounded features to build classifiers. The contributions of this study include the following: (Bird 2006) proposing a method of generating domain-dependent Chinese morphemes directly from large data set without any predefined sentimental resources; (Bradley and Lang 1999) building morpheme-based classifiers applicable in various movie genres, and shown to produce better results than other classifiers based on keywords (NTUSD and HowNet) or feature selection (TFIDF); (Church and Hanks in Computational linguistics, 16(1), 22-29 1990) identifying compounds that have different semantic polarities depending on contexts.

[1]  Qiang Dong,et al.  Hownet and the Computation of Meaning: (With CD-ROM) , 2006 .

[2]  Chao-Lin Liu,et al.  Sentiment Classification of Short Chinese Sentences , 2010, International Conference on Computational Linguistics.

[3]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[4]  Hua Xu,et al.  Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis , 2012, Expert Syst. Appl..

[5]  Hsin-Hsi Chen,et al.  Sentence-Level Opinion Analysis by CopeOpi in NTCIR-7 , 2008, NTCIR.

[6]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[7]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[8]  Oi Yee Kwong,et al.  Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words , 2004, COLING.

[9]  Daling Wang,et al.  Unsupervised Learning Chinese Sentiment Lexicon from Massive Microblog Data , 2012, ADMA.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[12]  Gwyneth Tseng,et al.  Chinese text segmentation for text retrieval: achievements and problems , 1993 .

[13]  Wen Shi,et al.  Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[14]  Likun Qiu,et al.  Expanding Chinese Sentiment Dictionaries from Large Scale Unlabeled Corpus , 2010, PACLIC.

[15]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[16]  Qiang Dong,et al.  Hownet And The Computation Of Meaning , 2006 .

[17]  Marcus Taft,et al.  Morphology, Orthography, and Phonology in Reading Chinese Compound Words , 1999 .

[18]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[19]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[20]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[21]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[22]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Hsin-Hsi Chen,et al.  Using Morphological and Syntactic Structures for Chinese Opinion Analysis , 2009, EMNLP.

[25]  M. Bradley,et al.  Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings , 1999 .

[26]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[27]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[28]  Miaomiao Wen,et al.  Disambiguating Dynamic Sentiment Ambiguous Adjectives , 2010, COLING.

[29]  Dun Li,et al.  Words semantic orientation classification based on HowNet , 2009 .

[30]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[31]  Gwyneth Tseng,et al.  ACTS: an automatic Chinese text segmentation system for full text retrieval , 1995 .

[32]  Tianfang Yao,et al.  Kernel-based Sentiment Classification for Chinese Sentence , 2007, Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007).

[33]  Xin Wang,et al.  A Morpheme-based Method to Chinese Sentence-Level Sentiment Classification , 2011, Int. J. Asian Lang. Process..

[34]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[35]  Desheng Dash Wu,et al.  Using text mining and sentiment analysis for online forums hotspot detection and forecast , 2010, Decis. Support Syst..