Extracting Diurnal Patterns of Real World Activity from Social Media

In this study, we develop methods to identify verbal expressions in social media streams that refer to real-world activities. Using aggregate daily patterns of Foursquare checkins, our methods extract similar patterns from Twitter, extending the amount of available content while preserving high relevance. We devise and test several methods to extract such content, using time-series and semantic similarity. Evaluating on key activity categories available from Foursquare (coffee, food, shopping and nightlife), we show that our extraction methods are able to capture equivalent patterns in Twitter. By examining rudimentary categories of activity such as nightlife, food or shopping we peek at the fundamental rhythm of human behavior and observe when it is disrupted. We use data compiled during the abnormal conditions in New York City throughout Hurricane Sandy to examine the outcome of our methods.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Marcus Foth,et al.  From Social Butterfly to Engaged Citizen: Urban Informatics, Social Media, Ubiquitous Computing, and Mobile Technology to Support Citizen Engagement , 2011, UbiComp 2011.

[3]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[4]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[5]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[6]  Bernardo A. Huberman,et al.  Rhythms of social interaction: messaging within a massive online network , 2006, ArXiv.

[7]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[8]  Mor Naaman,et al.  On the Study of Diurnal Urban Routines on Twitter , 2012, ICWSM.

[9]  Francisco Câmara Pereira,et al.  19 Crowdsensing in the Web: Analyzing the Citizen Experience in the Urban Space , 2011 .

[10]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[11]  Ming-Wei Chang,et al.  Simple and Knowledge-intensive Generative Model for Named Entity Recognition , 2012 .

[12]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[13]  Fei Wang,et al.  ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback , 2012, AAAI.

[14]  Milad Shokouhi,et al.  Time-sensitive query auto-completion , 2012, SIGIR '12.

[15]  Mark H. Hansen,et al.  Urban sensing: out of the woods , 2008, CACM.

[16]  Yossi Matias,et al.  On the Predictability of Search Trends , 2009 .

[17]  Diana Inkpen,et al.  Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words , 2006, LREC.

[18]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[19]  Norman M. Sadeh,et al.  The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City , 2012, ICWSM.

[20]  Josep Blat,et al.  Digital Footprinting: Uncovering Tourists with User-Generated Content , 2008, IEEE Pervasive Computing.

[21]  Felix Kling,et al.  When a city tells a story: urban topic analysis , 2012, SIGSPATIAL/GIS.

[22]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[23]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[24]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[25]  G. Madey,et al.  Uncovering individual and collective human dynamics from mobile phone records , 2007, 0710.2939.

[26]  Very Large Corpora Empirical Methods in Natural Language Processing , 1999 .

[27]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[28]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .