Big Data Quality Metrics for Sentiment Analysis Approaches

In a world increasingly connected, and in which information flows quickly and affects a very large number of people, sentiment analysis has seen a spectacular development over the past ten years. This is due to the fact that the explosion of social networks has allowed anyone with internet access to publicly express his opinion. Moreover, the emergence of big data has brought enormous opportunities and powerful storage and analytics tools to the field of sentiment analysis. However, big data introduces new variables and constraints that could radically affect the traditional models of sentiment analysis. Therefore, new concerns, such as big data quality, have to be addressed to get the most out of big data. To the best of our knowledge, no contributions have been published so far which address big data quality in SA throughout its different processes. In this paper, we first highlight the most important big data quality metrics to consider in any big data project. Then, we show how these metrics could be specifically considered in SA approaches and this for each phase in the big data value chain.

[1]  Mohamed Salah Gouider,et al.  Big data analysis to Features Opinions Extraction of customer , 2017, KES.

[2]  Evangelos E. Milios,et al.  Data Quality Challenges in Twitter Content Analysis for Informing Policy Making in Health Care , 2018, HICSS.

[3]  Jia Wang,et al.  Predicting Stock Price Returns Using Microblog Sentiment for Chinese Stock Market , 2017, 2017 3rd International Conference on Big Data Computing and Communications (BIGCOM).

[4]  Guy G. Gable,et al.  Information Quality in Social Media: A Conceptual Model , 2013, PACIS.

[5]  Andreas Kerren,et al.  The State of the Art in Sentiment Visualization , 2018, Comput. Graph. Forum.

[6]  A. Bianchi,et al.  Can Big Data provide good quality statistics ? A case study on sentiment analysis on Twitter data , 2018 .

[7]  Ambuj Kumar Agarwal,et al.  Sentiment analysis of big data applications using Twitter Data with the help of HADOOP framework , 2016, 2016 International Conference System Modeling & Advancement in Research Trends (SMART).

[8]  Mohamed Adel Serhani,et al.  Big Data Quality: A Survey , 2018, 2018 IEEE International Congress on Big Data (BigData Congress).

[9]  Serkan Ayvaz,et al.  Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis , 2018, Telematics Informatics.

[10]  Rochdi Messoussi,et al.  A novel adaptable approach for sentiment analysis on big social data , 2018, Journal of Big Data.

[11]  J. Kokila,et al.  Sentiment analysis using big data , 2015, 2015 International Conference on Computation of Power, Energy, Information and Communication (ICCPEIC).

[12]  Stanko Dimitrov,et al.  How efficient is Twitter: Predicting 2012 U.S. presidential elections using Support Vector Machine via Twitter and comparing against Iowa Electronic Markets , 2017, 2017 Intelligent Systems Conference (IntelliSys).

[13]  Radhika M. Pai,et al.  Stock market prediction: A big data approach , 2015, TENCON 2015 - 2015 IEEE Region 10 Conference.

[14]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[15]  Sharma Parul,et al.  Prediction of Indian election using sentiment analysis on Hindi Twitter , 2016 .

[16]  Pekka Pääkkönen,et al.  Evaluating the Quality of Social Media Data in Big Data Architecture , 2015, IEEE Access.

[17]  Joseph Murphy,et al.  Total Twitter Error: Decomposing Public Opinion Measurement on Twitter from a Total Survey Error Perspective , 2017 .