Social media analytics - Challenges in topic discovery, data collection, and data preparation

Abstract Since an ever-increasing part of the population makes use of social media in their day-to-day lives, social media data is being analysed in many different disciplines. The social media analytics process involves four distinct steps, data discovery, collection, preparation, and analysis. While there is a great deal of literature on the challenges and difficulties involving specific data analysis methods, there hardly exists research on the stages of data discovery, collection, and preparation. To address this gap, we conducted an extended and structured literature analysis through which we identified challenges addressed and solutions proposed. The literature search revealed that the volume of data was most often cited as a challenge by researchers. In contrast, other categories have received less attention. Based on the results of the literature search, we discuss the most important challenges for researchers and present potential solutions. The findings are used to extend an existing framework on social media analytics. The article provides benefits for researchers and practitioners who wish to collect and analyse social media data.

[1]  Jun Yang,et al.  Do you get tired of socializing? An empirical explanation of discontinuous usage behaviour in social network services , 2016, Inf. Manag..

[2]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[3]  Les Carr,et al.  Challenging social media analytics: web science perspectives , 2014, WebSci '14.

[4]  Asif Gill,et al.  Using Social Architecture to Analyzing Online Social Network Use in Emergency Management , 2014, AMCIS.

[5]  Katarzyna Wegrzyn-Wolska,et al.  Social media analysis for e-health and medical purposes , 2011, 2011 International Conference on Computational Aspects of Social Networks (CASoN).

[6]  Stefan Stieglitz,et al.  An Interdisciplinary Approach and Its Implications for Information Systems , 2014 .

[7]  Hsinchun Chen,et al.  Social Media Analytics and Intelligence , 2010, IEEE Intell. Syst..

[8]  Milad Mirbabaie,et al.  Sensemaking in Social Media Crisis Communication - a Case Study on the Brussels Bombings in 2016 , 2017, ECIS.

[9]  Tuan Phan,et al.  Investigating the Impact of Network Effects on Content Generation: Evidence from a Large Online Student Network , 2015, ICIS.

[10]  Stefan Stieglitz,et al.  Social Media Analytics , 2014 .

[11]  Pericles A. Mitkas,et al.  Event identification in web social media through named entity recognition and topic modeling , 2013, Data Knowl. Eng..

[12]  Richard T. Watson,et al.  Analyzing the Past to Prepare for the Future: Writing a Literature Review , 2002, MIS Q..

[13]  Benoit Huet,et al.  Social event discovery by topic inference , 2012, 2012 13th International Workshop on Image Analysis for Multimedia Interactive Services.

[14]  Kenneth M. Anderson,et al.  Design Challenges/Solutions for Environments Supporting the Analysis of Social Media Data in Crisis Informatics Research , 2015, 2015 48th Hawaii International Conference on System Sciences.

[15]  Jean-Valère Cossu,et al.  A review of features for the discrimination of twitter users: application to the prediction of offline influence , 2015, Social Network Analysis and Mining.

[16]  Stefan Stieglitz,et al.  Sense‐Making in Social Media During Extreme Events , 2018 .

[17]  Divesh Srivastava,et al.  Data quality: The other face of Big Data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[18]  Han-fen Hu,et al.  Social Media Information diffusion and Economic Outcomes: Twitter Retweets and Box Office Revenue , 2016, PACIS.

[19]  Paul Benjamin Lowry,et al.  A Systematic Review of Social Networks Research in Information Systems: Building a Foundation for Exciting Future Research , 2015, Commun. Assoc. Inf. Syst..

[20]  K. V. Rama Satish,et al.  Big data processing with harnessing hadoop - MapReduce for optimizing analytical workloads , 2014, 2014 International Conference on Contemporary Computing and Informatics (IC3I).

[21]  David S. Ebert,et al.  Public behavior response analysis in disaster events utilizing visual analytics of microblog data , 2014, Comput. Graph..

[22]  Youngsoo Kim,et al.  Extending the Network: the Influence of Offline Friendship to Twitter Network , 2016, AMCIS.

[23]  Joshua A. Tucker,et al.  Social media and political communication: A survey of Twitter users during the 2013 Italian general election , 2013 .

[24]  Starr Roxanne Hiltz,et al.  Introduction: Social media and collaborative systems for crisis management , 2011, TCHI.

[25]  Huan Liu,et al.  When is it biased?: assessing the representativeness of twitter's streaming API , 2014, WWW.

[26]  Ming-Hsiang Tsou,et al.  Social media analytics and research test-bed (SMART dashboard) , 2015, SMSociety.

[27]  Jiawei Han,et al.  SocialCube: A Text Cube Framework for Analyzing Social Media Data , 2012, 2012 International Conference on Social Informatics.

[28]  Min-Seok Kim,et al.  A New Feature Transformation Method Based on Rotation for Speaker Identification , 2007 .

[29]  Jian Cai,et al.  How Rumors Spread and Stop over Social Media: a Multi-Layered Communication Model and Empirical Analysis , 2015, Commun. Assoc. Inf. Syst..

[30]  Stefan Stieglitz,et al.  An Overview of Topic Discovery in Twitter Communication through Social Media Analytics , 2015, AMCIS.

[31]  Xin Chen,et al.  Mining Social Media Data for Understanding Students’ Learning Experiences , 2014, IEEE Transactions on Learning Technologies.

[32]  Stefan Stieglitz,et al.  Communication Roles in Public Events - A Case Study on Twitter Communication , 2014, IS&O.

[33]  William Rand,et al.  Comparing Social Tags to Microblogs , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[34]  D. Ruths,et al.  Social media for large studies of behavior , 2014, Science.

[35]  Xuelong Li,et al.  Toward an SDN-enabled big data platform for social TV analytics , 2015, IEEE Network.

[36]  Paul Watson,et al.  A Platform for Analysing Stream and Historic Data with Efficient and Scalable Design Patterns , 2014, 2014 IEEE World Congress on Services.

[37]  Yong Tan,et al.  Social Networks and the Diffusion of User-Generated Content: Evidence from YouTube , 2012, Inf. Syst. Res..

[38]  Anuradha Goswami,et al.  A survey of event detection techniques in online social networks , 2016, Social Network Analysis and Mining.

[39]  Theo Lynn,et al.  Towards a general research framework for social media research using big data , 2015, 2015 IEEE International Professional Communication Conference (IPCC).

[40]  Kevin Driscoll,et al.  Big Data, Big Questions| Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data , 2014 .

[41]  Hock Chuan Chan,et al.  The Medium Matters: Effects on What Consumers Talk about Regarding Movie Trailers , 2016, ICIS.

[42]  Asbjørn Følstad,et al.  Political Social Media Sites as Public Sphere: A Case Study of the Norwegian Labour Party , 2014, Commun. Assoc. Inf. Syst..

[43]  Jhony Choon Yeong Ng,et al.  The dark side of social media game: the addiction of social gamers , 2016 .

[44]  Merja Mahrt,et al.  The Value of Big Data in Digital Media Research , 2013 .

[45]  Gerald C. Kane,et al.  What's Different about Social Media Networks? A Framework and Research Agenda , 2014, MIS Q..

[46]  D. Maynard,et al.  Challenges in developing opinion mining tools for social media , 2012 .

[47]  Victoria L. Rubin,et al.  Veracity Roadmap: Is Big Data Objective, Truthful and Credible? , 2014 .

[48]  Aditya Patel,et al.  Using social big media for customer analytics , 2014, 2014 Conference on IT in Business, Industry and Government (CSIBIG).

[49]  Björn Niehaves,et al.  Reconstructing the giant: On the importance of rigour in documenting the literature search process , 2009, ECIS.

[50]  G. King,et al.  Ensuring the Data-Rich Future of the Social Sciences , 2011, Science.

[51]  Sung-Byung Yang,et al.  Factors Influencing Facebook Users' Political Participation: Investigating the Cambodian Case , 2015, PACIS.

[52]  Nafees Ur Rehman,et al.  OLAPing social media: The case of Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[53]  Stefan Stieglitz,et al.  Social media and political communication: a social media analytics framework , 2012, Social Network Analysis and Mining.

[54]  Edgar A. Maldonado,et al.  Just Keep Tweeting: Emergency Responder's Social Media Use Before and During Emergencies , 2015, ECIS.

[55]  E. Hargittai Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites , 2015 .

[56]  Tina Blegind Jensen,et al.  Exploring Affordances Of Facebook As A Social Media Platform In Political Campaigning , 2013, ECIS.

[57]  Okyay Kaynak,et al.  Big Data for Modern Industry: Challenges and Trends [Point of View] , 2015, Proc. IEEE.

[58]  Sara Hofmann,et al.  Just Because we can - Governments' Rationale for using Social Media , 2014, ECIS.

[59]  Changsheng Xu,et al.  Multi-Modal Event Topic Model for Social Event Analysis , 2016, IEEE Transactions on Multimedia.

[60]  Tobun Dorbin Ng,et al.  Analyzing and Visualizing Web Opinion Development and Social Interactions With Density-Based Clustering , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[61]  R. Briggs,et al.  Association for Information Systems , 2009 .

[62]  Hans-Georg Kemper,et al.  Management Support with Structured and Unstructured Data—An Integrated Business Intelligence Framework , 2008, Inf. Syst. Manag..

[63]  Min Song,et al.  RT^2M: Real-Time Twitter Trend Mining System , 2013, 2013 International Conference on Social Intelligence and Technology.

[64]  Dhavan V. Shah,et al.  Big Data, Digital Media, and Computational Social Science , 2015 .

[65]  Ke-Wei Huang,et al.  The Monetary Value of Twitter Followers: Evidences from NBA Players , 2014, ICIS.

[66]  Kathleen M. Carley,et al.  Two 1%s Don't Make a Whole: Comparing Simultaneous Samples from Twitter's Streaming API , 2014, SBP.

[67]  Soon Ae Chun,et al.  Twitter sentiment classification for measuring public health concerns , 2015, Social Network Analysis and Mining.

[68]  Christopher Garcia Demystifying MapReduce , 2013, Complex Adaptive Systems.

[69]  Twitter Sentiment Classification , 2016 .

[70]  Rajasekar Krishnamurthy,et al.  Constructing consumer profiles from social media data , 2013, 2013 IEEE International Conference on Big Data.

[71]  Donald A. Adjeroh,et al.  Crawling Credible Online Medical Sentiments for Social Intelligence , 2013, 2013 International Conference on Social Computing.

[72]  Chen Li,et al.  LSM-Based Storage and Indexing: An Old Idea with Timely Benefits , 2015, GeoRich@SIGMOD.

[73]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[74]  Michael Beier,et al.  Social Media Adoption: Barriers to the Strategic Use of Social Media in SMEs , 2016, ECIS.

[75]  Opher Etzion,et al.  Event processing under uncertainty , 2012, DEBS.

[76]  Zhixing Zhang,et al.  How do Explicitly expressed Emotions Influence Interpersonal Communication and Information Dissemination? A field Study of Emoji's effects on commenting and Retweeting on a Microblog Platform , 2016, PACIS.

[77]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[78]  Antonio Ortega,et al.  Ups and Downs in Buzzes: Life Cycle Modeling for Temporal Pattern Discovery , 2014, 2014 IEEE International Conference on Data Mining.

[79]  David Godes,et al.  Introduction to the Special Issue - Social Media and Business Transformation: A Framework for Research , 2013, Inf. Syst. Res..

[80]  Andreas Jungherr Twitter use in election campaigns: A systematic literature review , 2016 .

[81]  Regina Pfleger,et al.  The Business Alignment of Social Media Analytics , 2015, ECIS.

[82]  Weiguo Fan,et al.  The power of social media analytics , 2014, CACM.

[83]  Viswanath Venkatesh,et al.  Guidelines for Conducting Mixed-methods Research: An Extension and Illustration , 2016, J. Assoc. Inf. Syst..

[84]  Rahul Rishi,et al.  Data collection and analytics strategies of social networking websites , 2015, 2015 International Conference on Green Computing and Internet of Things (ICGCIoT).

[85]  Fang Liu,et al.  Retransmitting Messages Online in Evolving Disasters: A Scenario Simulation , 2015, ICIS.

[86]  Xiaoru Yuan,et al.  Interactive Visual Discovering of Movement Patterns from Sparsely Sampled Geo-tagged Social Media Data , 2016, IEEE Transactions on Visualization and Computer Graphics.

[87]  Cliff Lampe,et al.  Big Data in Survey Research AAPOR Task Force Report , 2015 .

[88]  R. McLean,et al.  Unleashing corporate communications via social media : a UK study of brand management and conversations with customers , 2015 .

[89]  Dong Yang,et al.  Containment of Misinformation Propagation in Online Social Networks with given Deadline , 2014, PACIS.

[90]  Hsiu-Li Liao,et al.  An exploratory study of product placement in social media , 2015, Internet Res..

[91]  Cees T. A. M. de Laat,et al.  Addressing big data issues in Scientific Data Infrastructure , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[92]  Guopeng Yu,et al.  Which User-generated Content Should Be Appreciated More? - A Study on UGC Features, Consumers' Behavioral Intentions and Social Media Engagement , 2015, ECIS.

[93]  Vikas Sindhwani,et al.  Emerging topic detection using dictionary learning , 2011, CIKM '11.

[94]  Shaowen Wang,et al.  A scalable framework for spatiotemporal analysis of location-based social media data , 2014, Comput. Environ. Urban Syst..

[95]  Stefan Stieglitz,et al.  Do Social Bots Dream of Electric Sheep? A Categorisation of Social Media Bot Accounts , 2017, ACIS.

[96]  Lotfi Bouzguenda,et al.  Data warehouse design approaches from social media: review and comparison , 2017, Social Network Analysis and Mining.

[97]  Jari Juhani Jussila,et al.  Developing a Conceptual Model for the Relationship Between Social Media Behavior, Negative Consumer Emotions and Brand Disloyalty , 2016, I3E.

[98]  Stefan Stieglitz,et al.  Sensemaking and Communication Roles in Social Media Crisis Communication , 2017, Wirtschaftsinformatik.

[99]  Zeynep Tufekci,et al.  Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls , 2014, ICWSM.

[100]  Muhammad Al-Qurishi,et al.  Selecting the best open source tools for collecting and visualzing social media content , 2015, 2015 2nd World Symposium on Web Applications and Networking (WSWAN).

[101]  Constantinos K. Coursaris,et al.  Organizational Social Media: A Comprehensive Framework and Research Agenda , 2013, 2013 46th Hawaii International Conference on System Sciences.

[102]  Michael Gamon,et al.  Online and Social Media Data As an Imperfect Continuous Panel Survey , 2016, PloS one.

[103]  Hefu Liu,et al.  Secondary Crisis Communication on Social Media: the Role of Corporate response and Social Influence in Product-harm Crisis , 2014, PACIS.

[104]  Yelena Yesha,et al.  A Scalable System for Community Discovery in Twitter During Hurricane Sandy , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[105]  Xiaoru Yuan,et al.  MovementFinder: Visual analytics of origin-destination patterns from geo-tagged social media , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[106]  Rongjuan Chen,et al.  The Psychology behind People's Decision to Forward Disaster-Related Tweets , 2014, PACIS.

[107]  Axel Bruns,et al.  Faster than the speed of print: Reconciling 'big data' social media analysis and academic scholarship , 2013, First Monday.

[108]  R. Kitchin,et al.  Big Data, new epistemologies and paradigm shifts , 2014, Big Data Soc..

[109]  Prashant Kumar Singh,et al.  Tracing Information Flow and Analyzing the Effects of Incomplete Data in Social Media , 2012, 2012 Fourth International Conference on Computational Intelligence, Communication Systems and Networks.

[110]  Viswanath Venkatesh,et al.  Bridging the Qualitative-Quantitative Divide: Guidelines for Conducting Mixed Methods Research in Information Systems , 2013, MIS Q..

[111]  Florian Michahelles,et al.  Monitoring Trends on Facebook , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[112]  Harald Schoen,et al.  The Mediation of Politics through Twitter: An Analysis of Messages posted during the Campaign for the German Federal Election 2013 , 2016, J. Comput. Mediat. Commun..

[113]  Daniel A. Keim,et al.  Visual Analytics: Definition, Process, and Challenges , 2008, Information Visualization.

[114]  Michael Grossniklaus,et al.  Situation monitoring of urban areas using social media data streams , 2016, Inf. Syst..

[115]  Bahareh Rahmanzadeh Heravi,et al.  What Just Happened? A Framework for Social Event Detection and Contextualisation , 2015, 2015 48th Hawaii International Conference on System Sciences.

[116]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[117]  Imene Guellil,et al.  Social big data mining: A survey focused on opinion mining and sentiments analysis , 2015, 2015 12th International Symposium on Programming and Systems (ISPS).

[118]  Anthony Corso,et al.  GIS, Big Data, and a Tweet Corpus Operationalized via Natural Language Processing , 2015, AMCIS.

[119]  Stefan Stieglitz,et al.  Social Media Analytics , 2014, Business & Information Systems Engineering.

[120]  Dirk Neumann,et al.  Crime Mapping through Geo-Spatial Social Media Activity , 2014, ICIS.

[121]  Philip S. Yu,et al.  Deception detection in Twitter , 2015, Social Network Analysis and Mining.

[122]  Alya Mlaiki,et al.  Social Networking Continuance: When Habit Leads to Information Overload , 2015, ECIS.

[123]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[124]  Tetsuji Kuboyama,et al.  Event Detection from Millions of Tweets Related to the Great East Japan Earthquake Using Feature Selection Technique , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[125]  George Valkanas,et al.  Mining Twitter Data with Resource Constraints , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[126]  D. Bem Writing a Review Article for Psychological Bulletin , 1995 .

[127]  Björn Niehaves,et al.  Standing on the Shoulders of Giants: Challenges and Recommendations of Literature Search in Information Systems Research , 2015, Commun. Assoc. Inf. Syst..

[128]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[129]  Calton Pu,et al.  Toward a Real-Time Service for Landslide Detection: Augmented Explicit Semantic Analysis and Clustering Composition Approaches , 2015, 2015 IEEE International Conference on Web Services.

[130]  C. Wendling,et al.  The Use of Social Media in Risk and Crisis Communication , 2013 .

[131]  Philipp Cimiano,et al.  Event-based classification of social media streams , 2012, ICMR.

[132]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[133]  Jenq-Haur Wang,et al.  Towards an efficient platform for social big data analytics , 2015, 2015 24th Wireless and Optical Communication Conference (WOCC).

[134]  Pekka Pääkkönen,et al.  Evaluating the Quality of Social Media Data in Big Data Architecture , 2015, IEEE Access.

[135]  Huan Liu,et al.  Unsupervised feature selection for linked social media data , 2012, KDD.

[136]  Jimmy J. Lin On Building Better Mousetraps and Understanding the Human Condition , 2015 .

[137]  Cherie Conley DANGER THREAT MESSAGING: THE DARK SIDE OF SOCIAL MEDIA , 2014 .

[138]  Robert K. Cunningham,et al.  Computing on masked data: a high performance method for improving big data veracity , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[139]  Hong Zhang,et al.  Dart: A Geographic Information System on Hadoop , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[140]  Kenneth Benoit,et al.  Social media and political communication in the 2014 elections to the European Parliament , 2016 .

[141]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[142]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[143]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[144]  M. Carmen Ruiz,et al.  Petri Nets Formalization of Map/Reduce Paradigm to Optimise the Performance-Cost Tradeoff , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[145]  William Ribarsky,et al.  Less After-the-Fact: Investigative visual analysis of events from streaming twitter , 2013, 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV).

[146]  Florian Michahelles,et al.  Evaluation framework for social media brand presence , 2013, Social Network Analysis and Mining.

[147]  J. Carr,et al.  Social media in product development , 2015 .

[148]  F. Conrad,et al.  Social Media Analyses for Social Measurement. , 2016, Public opinion quarterly.

[149]  Yan Huang,et al.  Location-based event search in social texts , 2015, 2015 International Conference on Computing, Networking and Communications (ICNC).

[150]  Li Ning,et al.  Collecting, managing and analyzing social networking data effectively , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[151]  Fang Chen How to Integrate Social Media in IS Curriculum, Especially for a Small IS Program? , 2016, AMCIS.