Machine Learning and Lexicon Based Methods for Sentiment Classification: A Survey

Sentiment classification is an important subject in text mining research, which concerns the application of automatic methods for predicting the orientation of sentiment present on text documents, with many applications on a number of areas including recommender and advertising systems, customer intelligence and information retrieval. In this paper, we provide a survey and comparative study of existing techniques for opinion mining including machine learning and lexicon-based approaches, together with evaluation metrics. Also cross-domain and cross-lingual approaches are explored. Experimental results show that supervised machine learning methods, such as SVM and naive Bayes, have higher precision, while lexicon-based methods are also very competitive because they require few effort in human-labeled document and isn't sensitive to the quantity and quality of the training dataset.

[1]  ChengXiang Zhai,et al.  A two-stage approach to domain adaptation for statistical classifiers , 2007, CIKM '07.

[2]  Marco Maggini,et al.  An EM based training algorithm for cross-language text categorization , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[3]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[4]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[5]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[6]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[7]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[8]  John Carroll,et al.  Weakly supervised techniques for domain-independent sentiment classification , 2009, TSA@CIKM.

[9]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[10]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[11]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[12]  Sung-Hyon Myaeng,et al.  Domain-specific sentiment analysis using contextual feature generation , 2009, TSA@CIKM.

[13]  Sandra Kübler,et al.  Filling the Gap: Semi-Supervised Learning for Opinion Detection Across Domains , 2011, CoNLL.

[14]  Xueqi Cheng,et al.  Adaptive co-training SVM for sentiment classification on tweets , 2013, CIKM.

[15]  Guodong Zhou,et al.  Active Learning for Cross-domain Sentiment Classification , 2013, IJCAI.

[16]  Yanjun Qi,et al.  Sentiment classification based on supervised latent n-gram analysis , 2011, CIKM '11.

[17]  Xiaojun Wan,et al.  A Comparative Study of Cross-Lingual Sentiment Classification , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[18]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[19]  Danushka Bollegala,et al.  Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus , 2013, IEEE Transactions on Knowledge and Data Engineering.

[20]  Xiaolong Wang,et al.  Active Deep Networks for Semi-Supervised Sentiment Classification , 2010, COLING.

[21]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[22]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[23]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[24]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.

[25]  Houfeng Wang,et al.  Cross-Lingual Mixture Model for Sentiment Classification , 2012, ACL.

[26]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[27]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[28]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[29]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[30]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[31]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.