Chinese comments sentiment classification based on word2vec and SVMperf

We achieve similar features clustering using word2vec.A method for sentiment classification based on word2vec and SVMperf is proposed.Word2vec can extract deep semantic features between words.SVMperf trains faster and predicts more accurate than other SVM packages.Our classification result can reach more than 90% accuracy. Since the booming development of e-commerce in the last decade, the researchers have begun to pay more attention to extract the valuable information from consumers comments. Sentiment classification, which focuses on classify the comments into positive class and negative class according to the polarity of sentiment, is one of the studies. Machine learning-based method for sentiment classification becomes mainstream due to its outstanding performance. Most of the existing researches are centered on the extraction of lexical features and syntactic features, while the semantic relationships between words are ignored. In this paper, in order to get the semantic features, we propose a method for sentiment classification based on word2vec and SVMperf. Our research consists of two parts of work. First of all, we use word2vec to cluster the similar features for purpose of showing the capability of word2vec to capture the semantic features in selected domain and Chinese language. And then, we train and classify the comment texts using word2vec again and SVMperf. In the process, the lexicon-based and part-of-speech-based feature selection methods are respectively adopted to generate the training file. We conduct the experiments on the data set of Chinese comments on clothing products. The experimental results show the superior performance of our method in sentiment classification.

[1]  Hongwei Wang,et al.  Sentiment classification of Chinese online reviews: analysing and improving supervised machine learning , 2012, Int. J. Web Eng. Technol..

[2]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Hua Xu,et al.  Clustering product features for opinion mining , 2011, WSDM '11.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[7]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[8]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[9]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[10]  Hua Xu,et al.  Exploiting effective features for chinese sentiment classification , 2011, Expert Syst. Appl..

[11]  James Nga-Kwok Liu,et al.  Sentiment classification of online reviews: using sentence-based language model , 2014, J. Exp. Theor. Artif. Intell..

[12]  Hua Xu,et al.  Sentiment classification for Chinese reviews based on key substring features , 2009, 2009 International Conference on Natural Language Processing and Knowledge Engineering.

[13]  Tao Wang,et al.  Dual Training and Dual Prediction for Polarity Classification , 2013, ACL.

[14]  Rui Xia,et al.  Exploring the Use of Word Relation Features for Sentiment Classification , 2010, COLING.

[15]  Hsinchun Chen,et al.  Selecting Attributes for Sentiment Classification Using Feature Relation Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[16]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[17]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine Learning.

[18]  Maosong Sun,et al.  Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[19]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[20]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[21]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[22]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[23]  Deyu Li,et al.  A feature selection method based on improved fisher's discriminant ratio for text sentiment classification , 2011, Expert Syst. Appl..

[24]  Wessel Kraaij,et al.  A Shallow Approach to Subjectivity Classification , 2008, ICWSM.

[25]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[26]  Claire Cardie,et al.  Multi-Level Structured Models for Document-Level Sentiment Classification , 2010, EMNLP.

[27]  Pei Yin,et al.  Sentiment Feature Identification from Chinese Online Reviews , 2011 .

[28]  Hua Xu,et al.  Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints , 2010, COLING.

[29]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[30]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[31]  Suk Hwan Lim,et al.  Extracting and Ranking Product Features in Opinion Documents , 2010, COLING.

[32]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[33]  James Nga-Kwok Liu,et al.  Text feature selection for sentiment classification of Chinese online reviews , 2013, J. Exp. Theor. Artif. Intell..

[34]  Guodong Zhou,et al.  Active Learning for Imbalanced Sentiment Classification , 2012, EMNLP.