Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis

In this paper we propose a new parallel architecture based on Big Data technologies for real-time sentiment analysis on microblogging posts. Polypus is a modular framework that provides the following functionalities: (1) massive text extraction from Twitter, (2) distributed non-relational storage optimized for time range queries, (3) memory-based intermodule buffering, (4) real-time sentiment classification, (5) near real-time keyword sentiment aggregation in time series, (6) a HTTP API to interact with the Polypus cluster and (7) a web interface to analyze results visually. The whole architecture is self-deployable and based on Docker containers.

[1]  Jared Kramer,et al.  Improvement of a Naive Bayes Sentiment Classifier Using MRS-Based Features , 2014, *SEMEVAL.

[2]  Andrzej Romanowski,et al.  Sentiment analysis of Twitter data within big data distributed environment for stock prediction , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[3]  Xiaozhong Liu,et al.  Mirroring the real world in social media: twitter, geolocation, and sentiment analysis , 2013, UnstructureNLP@CIKM.

[4]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  Amir Hossein Akhavan Rahnama Distributed real-time sentiment analysis for big data social streams , 2014, 2014 International Conference on Control, Decision and Information Technologies (CoDIT).

[7]  Ana Minanovic,et al.  Big data and sentiment analysis using KNIME: Online reviews vs. social media , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[8]  Sean P. Goggins,et al.  Twitter zombie: architecture for capturing, socially transforming and analyzing the twittersphere , 2012, GROUP.

[9]  Alessandro Moschitti,et al.  Multi-lingual opinion mining on YouTube , 2016, Inf. Process. Manag..

[10]  Rajiv Ramnath,et al.  Towards building large-scale distributed systems for twitter sentiment analysis , 2012, SAC '12.

[11]  Pablo Gamallo,et al.  Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets , 2014, *SEMEVAL.

[12]  Fabio Crestani,et al.  Like It or Not , 2016, ACM Comput. Surv..

[13]  Miguel A. Alonso,et al.  On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages , 2015, J. Assoc. Inf. Sci. Technol..

[14]  Christos Doulkeridis,et al.  Scalable and Real-Time Sentiment Analysis of Twitter Data , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[15]  José Carlos González Cristóbal,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013 .

[16]  Trevor Cohn,et al.  Trendminer: An Architecture for Real Time Analysis of Social Media Text , 2012, ICWSM 2012.

[17]  Genshe Chen,et al.  Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier , 2013, 2013 IEEE International Conference on Big Data.

[18]  Xabier Artola,et al.  Big data for Natural Language Processing: A streaming approach , 2015, Knowl. Based Syst..

[19]  Ahmed Emam,et al.  Real-time sentiment analysis of Saudi dialect tweets using SPARK , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[20]  José Martins,et al.  TwitterEcho: a distributed focused crawler to support open research with twitter data , 2012, WWW.

[21]  Sanjay Kumar Jena,et al.  Sarcastic sentiment detection in tweets streamed in real time: a big data approach , 2016, Digit. Commun. Networks.

[22]  Pablo Basanta-Val,et al.  T-Hoarder: A framework to process Twitter data streams , 2017, J. Netw. Comput. Appl..

[23]  Marcos Garcia,et al.  TASS: A Naive-Bayes strategy for sentiment analysis on Spanish tweets , 2013 .

[24]  Shrikanth S. Narayanan,et al.  A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle , 2012, ACL.

[25]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[26]  Izzat Alsmadi,et al.  The Evaluation of the Public Opinion - A Case Study: MERS-CoV Infection Virus in KSA , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[27]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[28]  Sanjay Singh,et al.  Context based interesting tweet recommendation framework , 2016, 2016 IEEE Annual India Conference (INDICON).

[29]  Paramjeet Singh,et al.  Sentimental analysis of social media using R language and Hadoop: Rhadoop , 2016, 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO).