Microblog Sentiment Classification Using Parallel SVM in Apache Spark

In the information age, sentiment classification of internet topics is of great significance. This paper proposes a microblog sentiment classification approach with parallel support vector machine (SVM). The proposed method integrates the features of microblog with preprocessing to ensure the data suitable for sentiment classification. After the preprocessing process, Apache Spark parallel SVM is used to execute the classification. SVM is one of the most popular algorithms in text classification. It fits small scale and nonlinear problems. However, SVM takes very long when dealing with big data. We apply Spark to parallelize SVM with Radial Basis Function (RBF) kernel function. The introduction of Apache Spark results in outstanding performance in machine learning compared to Hadoop. The experiments show that Spark increases the execution speed of SVM significantly. At the same time the classification accuracy is also increased by information gain (IG) approach in the preprocessing and kernel function parameter selection.

[1]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[2]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[3]  Laura Ricci,et al.  Balanced Graph Partitioning with Apache Spark , 2014, Euro-Par Workshops.

[4]  Burairah Hussin,et al.  Opinion Mining of Movie Review using Hybrid Method of Support Vector Machine and Particle Swarm Optimization , 2013 .

[5]  Fangzhao Wu,et al.  Structured microblog sentiment classification via social context regularization , 2016, Neurocomputing.

[6]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[7]  Qing He,et al.  Extreme Support Vector Machine Classifier , 2008, PAKDD.

[8]  Sergio Herrero-Lopez,et al.  Accelerating SVMs by integrating GPUs into MapReduce clusters , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[9]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[10]  Alok N. Choudhary,et al.  Sentiment Analysis of Conditional Sentences , 2009, EMNLP.

[11]  Yao Lu,et al.  Exploring the Sentiment Strength of User Reviews , 2010, WAIM.

[12]  Fuzhen Zhuang,et al.  A parallel incremental extreme SVM classifier , 2011, Neurocomputing.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[15]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[17]  Vikas Sindhwani,et al.  Document-Word Co-regularization for Semi-supervised Sentiment Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[19]  Jin-Cheon Na,et al.  Phrase-Level Sentiment Polarity Classification Using Rule-Based Typed Dependencies and Additional Complex Phrases Consideration , 2012, Journal of Computer Science and Technology.

[20]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[21]  Rosa M. Carro,et al.  Sentiment analysis in Facebook and its application to e-learning , 2014, Comput. Hum. Behav..

[22]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..