The geography of Twitter topics in London

Social media data are increasingly perceived as alternative sources to public attitude surveys because of the volume of available data that are time-stamped and (sometimes) precisely located. Such data can be mined to provide planners, marketers and researchers with useful information about activities and opinions across time and space. However, in their raw form, textual data are still difficult to analyse coherently and Twitter streams pose particular interpretive challenges because they are restricted to just 140 characters. This paper explores the use of an unsupervised learning algorithm to classify geo-tagged Tweets from Inner London recorded during typical weekdays throughout 2013 into a small number of groups, following extensive text cleaning techniques. Our classification identifies 20 distinctive and interpretive topic groupings, which represent key types of Tweets, from describing activities or informal conversations between users, to the use of check-in applets. Our motivation is to use the classification to demonstrate how the nature of the content posted on Twitter varies according to the characteristics of places and users. Topics and attitudes expressed through Tweets are found to vary substantially across Inner London, and by time of day. Some observed variations in behaviour on Twitter can be attributed to the inferred demographic and socio-economic characteristics of users, but place and local activities can also exert a considerable influence. Overall, the classification was found to provide a valuable framework for investigating the content and coverage of Twitter usage across Inner London.

[1]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[2]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[3]  Daniele Quercia,et al.  Tracking "gross community happiness" from tweets , 2012, CSCW.

[4]  Alex Singleton,et al.  The internal structure of Greater London: a comparison of national and regional geodemographic models , 2015 .

[5]  Rizal Setya Perdana What is Twitter , 2013 .

[6]  Chaogui Kang,et al.  Social Sensing: A New Approach to Understanding Our Socioeconomic Environments , 2015 .

[7]  David S. Ebert,et al.  Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[8]  Jean-Claude Thill,et al.  Social area analysis, data mining, and GIS , 2008, Comput. Environ. Urban Syst..

[9]  Paul A. Longley,et al.  The Geotemporal Demographics of Twitter Usage , 2015 .

[10]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[11]  Richard Webber,et al.  Geodemographics, GIS and Neighbourhood Targeting , 2005 .

[12]  Thomas Ertl,et al.  Thematic Patterns in Georeferenced Tweets through Space-Time Visual Analytics , 2013, Computing in Science & Engineering.

[13]  Choochart Haruechaiyasak,et al.  Discovering Consumer Insight from Twitter via Sentiment Analysis , 2012, J. Univers. Comput. Sci..

[14]  Logan,et al.  Deriving age and gender from forenames for consumer analytics , 2016 .

[15]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[16]  Chris Smith,et al.  Avoiding the crowds: understanding Tube station congestion patterns from trip data , 2012, UrbComp '12.

[17]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[18]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[19]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[20]  Krzysztof Janowicz,et al.  How where is when? On the regional variability and resolution of geosocial temporal signatures for points of interest , 2015, Comput. Environ. Urban Syst..

[21]  Shirley Williams,et al.  What do people study when they study Twitter? Classifying Twitter related academic papers , 2013, J. Documentation.

[22]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[23]  D. Rose,et al.  The national statistics socio-economic classification: Unifying official and sociological approaches to the conceptualisation and measurement of social class in the United Kingdom : Enjeux et usages des catégories socioprofessionnelles en Europe , 2001 .

[24]  O. D. Duncan,et al.  A METHODOLOGICAL ANALYSIS OF SEGREGATION INDEXES , 1955 .

[25]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[26]  Panagiotis Takis Metaxas,et al.  Limits of Electoral Predictions Using Twitter , 2011, ICWSM.

[27]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[28]  Danah Boyd,et al.  I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience , 2011, New Media Soc..

[29]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[30]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[31]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[32]  Francisco C. Pereira,et al.  Mining point-of-interest data from social networks for urban land use classification and disaggregation , 2015, Comput. Environ. Urban Syst..