CrowdTSC: Crowd-based Neural Networks for Text Sentiment Classification

Sentiment classification is a fundamental task in content analysis. Although deep learning has demonstrated promising performance in text classification compared with shallow models, it is still not able to train a satisfying classifier for text sentiment. Human beings are more sophisticated than machine learning models in terms of understanding and capturing the emotional polarities of texts. In this paper, we leverage the power of human intelligence into text sentiment classification. We propose Crowd-based neural networks for Text Sentiment Classification (CrowdTSC for short). We design and post the questions on a crowdsourcing platform to collect the keywords in texts. Sampling and clustering are utilized to reduce the cost of crowdsourcing. Also, we present an attention-based neural network and a hybrid neural network, which incorporate the collected keywords as human being's guidance into deep neural networks. Extensive experiments on public datasets confirm that CrowdTSC outperforms state-of-the-art models, justifying the effectiveness of crowd-based keyword guidance.

[1]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[2]  Matt Post,et al.  Explicit and Implicit Syntactic Features for Text Classification , 2013, ACL.

[3]  Mausam,et al.  Re-Active Learning: Active Learning with Relabeling , 2016, AAAI.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[7]  Purnamrita Sarkar,et al.  Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning , 2014, Proc. VLDB Endow..

[8]  K. Robert Lai,et al.  Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model , 2016, ACL.

[9]  Xiaojun Wan,et al.  Sentiment Analysis of Peer Review Texts for Scholarly Papers , 2018, SIGIR.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[12]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[13]  Sibo Wang,et al.  Crowd-Based Deduplication: An Adaptive Approach , 2015, SIGMOD Conference.

[14]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[15]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[16]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[17]  Paul N. Bennett,et al.  Context-Aware Intent Identification in Email Conversations , 2019, SIGIR.

[18]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[19]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[20]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[21]  Jindong Chen,et al.  Deep Short Text Classification with Knowledge Powered Attention , 2019, AAAI.

[22]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[23]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[24]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[25]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[26]  Bo Huang,et al.  A New Method of Region Embedding for Text Classification , 2018, ICLR.

[27]  Guoliang Li,et al.  Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  Wang Ling,et al.  Generative and Discriminative Text Classification with Recurrent Neural Networks , 2017, ArXiv.

[29]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[30]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Kyunghyun Cho,et al.  Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers , 2016, ArXiv.

[32]  Tim Kraska,et al.  CrowdDB: Query Processing with the VLDB Crowd , 2011, Proc. VLDB Endow..

[33]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[34]  Robert E. Mercer,et al.  Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition , 2019, NAACL.

[35]  Jennifer Widom,et al.  Deco: A System for Declarative Crowdsourcing , 2012, Proc. VLDB Endow..

[36]  Nilesh N. Dalvi,et al.  Crowdsourcing Algorithms for Entity Resolution , 2014, Proc. VLDB Endow..

[37]  Douglas L. T. Rohde,et al.  An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence , 2005 .

[38]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[40]  Yangyang Shi,et al.  Deep LSTM based Feature Mapping for Query Classification , 2016, NAACL.

[41]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.