"Hey #311, Come Clean My Street!": A Spatio-temporal Sentiment Analysis of Twitter Data and 311 Civil Complaints

Twitter data has been applied to address a wide range of applications (e.g., Political election prediction and disease tracking), however, no studies have been conducted to explore the interactions and potential relationships between twitter data and social events available from government entities. In this paper, we introduce a novel approach to investigate the spatio-temporal relationships between the sentiment aspects of tweets and 311 civil complaints recorded in the 311 Case Database, which is freely available from the City of San Francisco. We also present results from two supporting tasks: (1) We apply sentiment analysis techniques to model the emotional characteristics of five metropolitan areas around the globe, allowing one to gain insight into the relative happiness across cities and neighborhoods within a city, and (2) we quantify the performance of several open-source machine learning algorithms for sentiment analysis by applying them to large volume of twitter data, thereby providing empirical guidelines for practitioners. Major contributions and findings include (1) We have developed a system for the relative ranking of happiness of a geographical area. Our results show that Sydney, Australia is the happiest of the five cities under study, (2) We have found a counterintuitive positive correlation between 311-report frequency and local sentiment, and (3) When performing sentiment analysis of tweets, the inclusion of emoticons in the training dataset can lead to model over fitting, whereas NLP-based features seem to have a great potential to improve the classification accuracy.

[1]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[2]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[3]  Daniel Gruhl,et al.  2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012, La Jolla, CA, USA, September 27-28, 2012 , 2012, HISB.

[4]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[5]  Hamed Haddadi,et al.  Topic and Sentiment Analysis on OSNs: a Case Study of Advertising Strategies on Twitter , 2013, ArXiv.

[6]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[7]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[8]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  S. Green,et al.  City and County of San Francisco , 2011 .

[11]  Fabrício Benevenuto,et al.  Comparing and combining sentiment analysis methods , 2013, COSN '13.

[12]  C. Hanson Healthcare Informatics , 2005 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[16]  Xiaohua Sun,et al.  Whisper: Tracing the Spatiotemporal Process of Information Diffusion in Real Time , 2012, IEEE Transactions on Visualization and Computer Graphics.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Scott A. Golder,et al.  Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures , 2011 .

[19]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.