Improving Chinese Sentiment Analysis via Segmentation-Based Representation Using Parallel CNN

Automatically analyzing sentimental implications in texts relies on well-designed models utilizing linguistic features. Therefore, the models are mostly language-dependent and designed for English texts. Chinese is with the largest users in the world and has a tremendous amount of texts daily generated from the social media, etc. However, it has seldom been studied. On another hand, a general observation, which is valid in many languages, is that different segments of a piece of text, e.g. a clause, having different sentimental polarities. The existing deep learning models neglect the imbalanced sentiment distribution and only take the entire piece of the text. This paper proposes a novel sentiment-analysis model, which is capable of sentiment analysis task in Chinese. Firstly, the model segments a text into smaller units according to the punctuations to obtain the preliminary text representation, and this step is so-called segmentation-based representation. Meanwhile, its new framework parallel-CNN (convolutional neural network) simultaneously use all segments. This model, we call SBR-PCNN, concatenate the representation of each segment to obtain the final representation of the text which does not only contain the semantic and syntactic features but also retains the essential sequential information. The proposed method has been evaluated on two Chinese sentiment classification datasets and compared with a broad range of baselines. Experimental results show that the proposed approach achieves the state of the art results on two benchmarking datasets. Meanwhile, they demonstrate that our model may improve the performance of Chinese sentiment analysis.

[1]  Nan Yang,et al.  Radical-Enhanced Chinese Character Embedding , 2014, ICONIP.

[2]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[3]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[4]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[5]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[6]  John Carroll,et al.  Unsupervised Classification of Sentiment and Objectivity in Chinese Text , 2008, IJCNLP.

[7]  Yue Zhang,et al.  Context-Sensitive Twitter Sentiment Classification Using Neural Network , 2016, AAAI.

[8]  Pu Zhang,et al.  A weakly supervised approach to Chinese sentiment classification using partitioned self-training , 2013, J. Inf. Sci..

[9]  Fei-Yue Wang,et al.  Sentiment analysis of Chinese documents: From sentence to document level , 2009 .

[10]  Hua Xu,et al.  Exploiting effective features for chinese sentiment classification , 2011, Expert Syst. Appl..

[11]  Hod Lipson,et al.  Re-embedding words , 2013, ACL.

[12]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[15]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[16]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[17]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[20]  Erik Cambria,et al.  Radical-Based Hierarchical Embeddings for Chinese Sentiment Analysis at Sentence Level , 2017, FLAIRS.

[21]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[22]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[23]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[24]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[25]  Lizhen Liu,et al.  Combining Domain-Specific Sentiment Lexicon with Hownet for Chinese Sentiment Analysis , 2013, J. Comput..

[26]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[27]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[28]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[29]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[32]  Zhiyuan Liu,et al.  Joint Learning of Character and Word Embeddings , 2015, IJCAI.