Parallel Sentiment Polarity Classification Method with Substring Feature Reduction

Sentiment analysis is an important issue in machine learning, which aims to identify the emotion expressed in corpus. However, sentiment analysis is a difficult task, especially in large-scale data, where feature reduction is needed. In this paper, we propose a parallel feature reduction algorithm for sentiment polarity classification based on a substring method. Specifically, the proposed algorithm is based on parallel computing under the Hadoop platform. The proposed algorithm is examined on a large data set and a K-nearest neighbor algorithm and a Rocchio algorithm are used for classification. Experimental results show that the proposed algorithm outperforms other commonly used methods in terms of the classification performance and the computational cost.

[1]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[2]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[3]  Li Yi-jun,et al.  Sentiment classification for Chinese product reviews using an unsupervised Internet-based method , 2008, 2008 International Conference on Management Science and Engineering 15th Annual Conference Proceedings.

[4]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[5]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[6]  Hua Xu,et al.  Exploiting effective features for chinese sentiment classification , 2011, Expert Syst. Appl..

[7]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[8]  Thi Thi Soe Nyunt,et al.  Sentiment Classification Based on Ontology and SVM Classifier , 2010, 2010 Second International Conference on Communication Software and Networks.

[9]  Wessel Kraaij,et al.  A Shallow Approach to Subjectivity Classification , 2008, ICWSM.

[10]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[11]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[12]  Brian Hayes,et al.  What Is Cloud Computing? , 2019, Cloud Technologies.

[13]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[14]  Maosong Sun,et al.  Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[15]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[16]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[17]  Nitin Indurkhya,et al.  Handbook of Natural Language Processing , 2010 .

[18]  Bernadette Bouchon-Meunier,et al.  Early Fusion of Low Level Features for Emotion Mining , 2012, Biomedical informatics insights.

[19]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[20]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[21]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[22]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[23]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[24]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[27]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[28]  Dell Zhang,et al.  Extracting key-substring-group features for text classification , 2006, KDD '06.