Twitter analytics: Architecture, tools and analysis

We study the temporal behavior of messages arriving in a social network. We specifically study the tweets and re-tweets sent to president Barack Obama on Twitter. We characterize the inter-arrival times between the tweets, the number of re-tweets and the spatial coordinates (latitude, longitude) of the users who sent the tweets. The modeling of the arrival process of tweets in Twitter can be applied to predict co-ordinated user behavior in social networks. While there is sufficient literature on social networks that present large volumes of collected data, the modeling and characterization of the data have been rarely discussed. The available data are usually very expensive and not comprehensive. Here, we develop a software architecture that uses a Twitter application program interface (API) to collect the tweets sent to specific users. We then extract the user ids and the exact time-stamps of the tweets. We use the collected data to characterize the inter-arrival times between tweets and the number of re-tweets. Our studies indicate that the arrival process of new tweets to a user can be modeled as a Poisson Process while the number of re-tweets follow a geometric distribution. Our data collection architecture is operating system (OS) independent. The results obtained in this research can be applied to study correlations between patterns of user behavior and their locations.