Detecting future social unrest in unprocessed Twitter data: “Emerging phenomena and big data”

We have implemented a social media data mining system capable of forecasting events related to Latin American social unrest. Our method directly extracts a small number of tweets from publicly-available data on twitter.com, condenses similar tweets into coherent forecasts, and assembles a detailed and easily-interpretable audit trail which allows end users to quickly collect information about an upcoming event. Our system functions by continually applying multiple textual and geographic filters to a large volume of data streaming from twitter.com via the public API as well as a commercial data feed. To be specific, we search the entirety of twitter.com for a few carefully chosen keywords, search within those tweets for mentions of future dates, filter again using various logistic regression classifiers, and finally assign a location to an event by geocoding retweeters. Geocoding is done using our previously-developed in-house geocoding service which, at the time of this writing, can infer the home location for over 62M twitter.com users [1]. Additionally, we identify demographics likely interested in an upcoming event by searching retweeter's recent posts for demographic-specific keywords.