论文信息 - A Large-Scale Sentiment Data Classification for Online Reviews Under Apache Spark

A Large-Scale Sentiment Data Classification for Online Reviews Under Apache Spark

Abstract Sentiment Analysis of large-scale data has become increasingly important and has attracted many researchers, urging them to use new platforms and tools that can handle large volumes of data. In this paper, we present new evaluation experiments of sentiment analysis for a large-scale dataset of online customer’s reviews under Apache Spark data Processing System. Apache Spark’s scalable machine learning library (MLlib) is used and three classification techniques from the library are applied; Naive Bayes, Support vector machine, and logistic regression. The results are evaluated using the accuracy metric. Experimental results show that Support vector machine classifier outperforms Naive Bayes and logistic regression classifiers.

[1] Yunhao Liu,et al. Big Data: A Survey , 2014, Mob. Networks Appl..

[2] Walaa Medhat,et al. Sentiment analysis algorithms and applications: A survey , 2014 .

[3] Marko Brakus,et al. Using Big Data and sentiment analysis in product evaluation , 2013, 2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[4] Reynold Xin,et al. Apache Spark , 2016 .

[5] Sotiris B. Kotsiantis,et al. Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[6] Seong Joon Yoo,et al. Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[7] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[8] Akshi Kumar,et al. Sentiment Analysis on Twitter , 2012 .

[9] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10] Athanasios K. Tsakalidis,et al. An Apache Spark Implementation for Sentiment Analysis on Twitter Data , 2016, ALGOCLOUD.

[11] Virginijus Marcinkevičius,et al. Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification , 2017, Balt. J. Mod. Comput..