Facilitating Twitter data analytics: Platform, language and functionality

Conducting analytics over data generated by Social Web portals such as Twitter is challenging, due to the volume, variety and velocity of the data. Commonly, adhoc pipelines are used that solve a particular use case. In this paper, we generalize across a range of typical Twitter-data use cases and determine a set of common characteristics. Based on this investigation, we present our Twitter Analytical Platform (TAP), a generic platform for conducting analytical tasks with Twitter data. The platform provides a domain-specific Twitter Analysis Language (TAL) as the interface to its functionality stack. TAL includes a set of analysis tools ranging from data collection and semantic enrichment, to machine learning. With these tools, it becomes possible to create and customize analytical workflows in TAL and build applications that make use of the analytics results. We showcase the applicability of our platform by building Twinder-a search engine for Twitter streams.

[1]  Khanittha Jitsaeng Putting the Public Back in Public Relations : How Social Media Is Reinventing Aging Business of PR Brian Solis and Deirdre Breakenridge , 2016 .

[2]  Arnim Bleier,et al.  When Politicians Talk: Assessing Online Conversational Practices of Political Parties on Twitter , 2014, ICWSM.

[3]  Thomas Steiner Telling Breaking News Stories from Wikipedia with Social Multimedia: A Case Study of the 2014 Winter Olympics , 2014, SoMuS@ICMR.

[4]  Peter Krammer,et al.  Combining Named Entity Recognition Methods for Concept Extraction in Microposts , 2014, #MSM.

[5]  Geert-Jan Houben,et al.  Groundhog day: near-duplicate detection on Twitter , 2013, WWW.

[6]  Eelco Visser,et al.  Declarative Name Binding and Scope Rules , 2012, SLE.

[7]  Geert-Jan Houben,et al.  Twinder: A Search Engine for Twitter Streams , 2012, ICWE.

[8]  Geert-Jan Houben,et al.  Leveraging User Modeling on the Social Web with Linked Data , 2012, ICWE.

[9]  Yong Yu,et al.  A comparative study of users' microblogging behavior on sina weibo and twitter , 2012, UMAP.

[10]  Geert-Jan Houben,et al.  Semantics + filtering + search = twitcident. exploring information in social web streams , 2012, HT '12.

[11]  Eelco Visser,et al.  Declarative specification of template-based textual editors , 2012, LDTA.

[12]  Michelle R. Guy,et al.  Twitter earthquake detection: earthquake monitoring in a social world , 2012 .

[13]  Geert-Jan Houben,et al.  What Makes a Tweet Relevant for a Topic? , 2012, #MSM.

[14]  Miguel Rios,et al.  Distilling Massive Amounts of Data into Simple Visualizations : Twitter Case Studies , 2012 .

[15]  Qi Gao,et al.  GeniUS: Generic User Modeling Library for the Social Semantic Web , 2011, JIST.

[16]  Cecilia Mascolo,et al.  Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks , 2011, The Social Mobile Web.

[17]  Panagiotis Takis Metaxas,et al.  How (Not) to Predict Elections , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[18]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[19]  Qi Gao,et al.  Analyzing temporal dynamics in Twitter profiles for personalized recommendations in the social web , 2011, WebSci '11.

[20]  Avare Stewart,et al.  A transfer approach to detecting disease reporting events in blog social media , 2011, HT '11.

[21]  Qi Gao,et al.  Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web , 2011, ESWC.

[22]  Qi Gao,et al.  TUMS: Twitter-Based User Modeling Service , 2011, ESWC Workshops.

[23]  Fabian Abel,et al.  WISTUD at TREC 2011: Microblog Track: Exploiting Background Knowledge from DBpedia and News Articles for Search on Twitter , 2011, TREC.

[24]  Eelco Visser,et al.  The spoofax language workbench: rules for declarative specification of languages and IDEs , 2010, OOPSLA.

[25]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[26]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[27]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[28]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[29]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[30]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[31]  Deirdre Breakenridge,et al.  Putting the Public Back in Public Relations: How Social Media Is Reinventing the Aging Business of PR , 2009 .

[32]  Eelco Visser,et al.  Stratego/XT 0.17. A language and toolset for program transformation , 2008, Sci. Comput. Program..

[33]  Thilo Götz,et al.  Design and implementation of the UIMA Common Analysis System , 2004, IBM Syst. J..

[34]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..