Quality management architecture for social media data

Social media data has provided various insights into the behaviour of consumers and businesses. However, extracted data may be erroneous, or could have originated from a malicious source. Thus, quality of social media should be managed. Also, it should be understood how data quality can be managed across a big data pipeline, which may consist of several processing and analysis phases. The contribution of this paper is evaluation of data quality management architecture for social media data. The theoretical concepts based on previous work have been implemented for data quality evaluation of Twitter-based data sets. Particularly, reference architecture for quality management in social media data has been extended and evaluated based on the implementation architecture. Experiments indicate that 150–800 tweets/s can be evaluated with two cloud nodes depending on the configuration.

[1]  Eila Ovaska,et al.  Situation-based and self-adaptive applications for the smart environment , 2012, J. Ambient Intell. Smart Environ..

[2]  Mohamed Adel Serhani,et al.  An Hybrid Approach to Quality Evaluation across Big Data Value Chain , 2016, 2016 IEEE International Congress on Big Data (BigData Congress).

[3]  Daniel Pakkala,et al.  Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems , 2015, Big Data Res..

[4]  Rachida Dssouli,et al.  Big Data Pre-processing: A Quality Framework , 2015, 2015 IEEE International Congress on Big Data.

[5]  Eric Gilbert,et al.  CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations , 2015, ICWSM.

[6]  Eva Zangerle,et al.  Recommending #-Tags in Twitter , 2011 .

[7]  Pekka Pääkkönen,et al.  Evaluating the Quality of Social Media Data in Big Data Architecture , 2015, IEEE Access.

[8]  Jerry Zeyu Gao,et al.  Big Data Validation and Quality Assurance -- Issuses, Challenges, and Needs , 2016, 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE).

[9]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[10]  Paul W. P. J. Grefen,et al.  A framework for analysis and design of software reference architectures , 2012, Inf. Softw. Technol..

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Barbara Poblete,et al.  Predicting information credibility in time-sensitive social media , 2013, Internet Res..

[13]  Philip C. Treleaven,et al.  Social media analytics: a survey of techniques, tools and platforms , 2014, AI & SOCIETY.

[14]  Paris Avgeriou,et al.  Empirically-grounded reference architectures: a proposal , 2011, QoSA-ISARCS '11.

[15]  Wesley De Neve,et al.  Towards Twitter hashtag recommendation using distributed word representations and a deep feed forward neural network , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[16]  Thomas Ludwig,et al.  Social Haystack , 2015, ACM Trans. Comput. Hum. Interact..

[17]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[18]  Jason J. Jung,et al.  Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[19]  Filippo Menczer,et al.  Clustering memes in social media , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[20]  Daniel Pakkala,et al.  The implications of disk-based RAID and virtualization for write-intensive services , 2015, SAC.

[21]  Pekka Pääkkönen Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing , 2016, Journal of Big Data.

[22]  Vana Kalogeraki,et al.  A Model for Identifying Misinformation in Online Social Networks , 2015, OTM Conferences.

[23]  Thomas Ludwig,et al.  Social-QAS: Tailorable Quality Assessment Service for Social Media Content , 2015, IS-EUD.

[24]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[25]  Kalina Bontcheva,et al.  TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text , 2013, RANLP.

[26]  Christian Reuter,et al.  Towards social resilience: A quantitative and qualitative survey on citizens' perception of social media in emergencies in Europe , 2017 .

[27]  Véronique Van Vlasselaer,et al.  Determining the use of data quality metadata (DQM) for decision making purposes and its impact on decision outcomes - An exploratory study , 2016, Decis. Support Syst..

[28]  Xiao Chen,et al.  6 million spam tweets: A large ground truth for timely Twitter spam detection , 2015, 2015 IEEE International Conference on Communications (ICC).

[29]  Ponnurangam Kumaraguru,et al.  TweetCred: Real-Time Credibility Assessment of Content on Twitter , 2014, SocInfo.

[30]  Geert-Jan Houben,et al.  Sifting useful comments from Flickr Commons and YouTube , 2014, International Journal on Digital Libraries.

[31]  Yaohang Li,et al.  Gaining competitive intelligence from social media data: Evidence from two largest retail chains in the world , 2015, Ind. Manag. Data Syst..

[32]  Huy Nguyen,et al.  Twitter Sentiment Analysis Using Machine Learning Techniques , 2020, ICCSAMA.

[33]  B. Chae,et al.  Insights from hashtag #supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research , 2015 .

[34]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[35]  Venky Shankararaman,et al.  Integration of Social Media Technologies with ERP: A Prototype Implementation , 2013, AMCIS.

[36]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[37]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[38]  Thomas Ludwig,et al.  XHELP: Design of a Cross-Platform Social-Media Application to Support Volunteer Moderators in Disasters , 2015, CHI.

[39]  Wesley De Neve,et al.  Using topic models for Twitter hashtag recommendation , 2013, WWW.

[40]  Pengfei Wang,et al.  How to Use the Social Media Data in Assisting Restaurant Recommendation , 2016, DASFAA Workshops.

[41]  Anatole Gershman,et al.  Topical Clustering of Tweets , 2011 .