Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques

Machine learning method in text classification has expanded from topic identification to more challenging tasks such as sentiment classification, and it is valuable to explore, compare methods applied in sentiment classification and investigate relevant influence factors. The chief aim of the present work is to compare four machine learning methods to sentiment classification of Chinese review. The corpus is made up of 16000 reviews from website. We investigate the factors which affect the performance: namely feature representation via Word-Based Unigram (WBU), Bigram (WBB) and Chinese Character-Based Bigram (CBB), Trigram (CBT); feature weighting schemes and feature dimensionality. Experimental evaluations show that performance depends on different settings. As a result, we draw a conclusion that Naive Bayes (NB) classifier obtains the best averaging performance when using WBB, CBT as features with bool weighting under different dimensionality to the task.