Towards a Twitter observatory: A multi-paradigm framework for collecting, storing and analysing tweets

In this article we show how a multi-paradigm framework can fulfil the requirements of tweets analysis and reduce the waiting time for researchers that use computational resources and storage systems to support large-scale data analysis. The originality of our approach is to combine concerns about data harvesting, data storage, data analysis and data visualisation into a framework that supports inductive reasoning in multidisciplinary scientific research. Our main contribution is a polyglot storage system with a generic data model to support logical data independence and a set of tools that can provide a suitable solution for mixing different types of algorithms in order to maximise the extraction of knowledge. We describe the software architecture of our framework, the generic model and we show how it has been used in major projects and what characteristics have been validated.

[1]  Mason A. Porter,et al.  Multilayer networks , 2013, J. Complex Networks.

[2]  Lek-Heng Lim Tensors and Hypermatrices , 2013 .

[3]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[4]  D. Ghosh,et al.  Multiparadigm Data Storage for Enterprise Applications , 2010, IEEE Software.

[5]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[6]  P. Perron,et al.  Computation and Analysis of Multiple Structural-Change Models , 1998 .

[7]  Paolo Atzeni,et al.  Uniform access to NoSQL systems , 2014, Inf. Syst..

[8]  Syed Akhter Hossain,et al.  NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison , 2013, ArXiv.

[9]  Rushed Kanawati,et al.  Multiplex Network Mining: A Brief Survey , 2015, IEEE Intell. Informatics Bull..

[10]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[11]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[12]  Gintaras V. Reklaitis,et al.  A multi-paradigm modeling framework for energy systems simulation and analysis , 2011, Comput. Chem. Eng..

[13]  John Sharp,et al.  Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence , 2013 .

[14]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[15]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[16]  Christine Collet,et al.  ExSchema: Discovering and Maintaining Schemas from Polyglot Persistence Applications , 2013, 2013 IEEE International Conference on Software Maintenance.

[17]  Frédéric Boulanger,et al.  Exploring Multi-Paradigm Modeling Techniques , 2009, Simul..

[18]  Marko A. Rodriguez,et al.  Exposing multi-relational networks to single-relational network analysis algorithms , 2008, J. Informetrics.

[19]  Cécile Favre,et al.  Mention-anomaly-based Event Detection and tracking in Twitter , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[20]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[21]  Cun-Quan Zhang,et al.  Laplacian centrality: A new centrality measure for weighted networks , 2012, Inf. Sci..

[22]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[23]  Anna Monreale,et al.  Multidimensional networks: foundations of structural analysis , 2013, World Wide Web.

[24]  A. Arenas,et al.  Mathematical Formulation of Multilayer Networks , 2013, 1307.4977.

[25]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[26]  David S. Matteson,et al.  Leveraging cloud data to mitigate user experience from ‘breaking bad’ , 2014, 2016 IEEE International Conference on Big Data (Big Data).

[27]  Vincent A. Knight,et al.  Tweeting the terror: modelling the social media reaction to the Woolwich terrorist attack , 2014, Social Network Analysis and Mining.

[28]  Omer F. Rana,et al.  International Journal of Parallel, Emergent and Distributed Systems Cosmos: towards an Integrated and Scalable Service for Analysing Social Media on Demand Cosmos: towards an Integrated and Scalable Service for Analysing Social Media on Demand , 2022 .

[29]  Y. Matsuo,et al.  Tweet trend analysis in an emergency situation , 2011, SWID '11.

[30]  Gregory J. Park,et al.  Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets , 2012, 2012 11th International Conference on Machine Learning and Applications.

[31]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.