Location Classification Based on Tweets

Location classification is used for associating type to locations, to enrich maps and support a plethora of geospatial applications that rely on location types. Classification can be performed by humans, but using machine learning is more efficient and faster to react to changes than human-based classification. Machine learning can be used in lieu of human classification or for supporting it. In this paper we study the use of machine learning for Geosocial Location Classification, where the type of a site, e.g., a building, is discovered based on social-media posts, e.g., tweets. Our goal is to correctly associate a set of tweets posted in a small radius around a given location with the corresponding location type, e.g., school, church, restaurant or museum. We explore two approaches to the problem: (a) a pipeline approach, where each post is first classified, and then the location associated with the set of posts is inferred from the individual post labels; and (b) a joint approach where the individual posts are simultaneously processed to yield the desired location type. We tested the two approaches over a data set of geotagged tweets. Our results demonstrate the superiority of the joint approach. Moreover, we show that due to the unique structure of the problem, where weakly-related messages are jointly processed to yield a single final label, linear classifiers outperform deep neural network alternatives.

[1]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[2]  Quoc V. Le,et al.  Document Embedding with Paragraph Vectors , 2015, ArXiv.

[3]  Aixin Sun,et al.  A Survey of Location Prediction on Twitter , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4]  Yaron Kanza,et al.  On the Correlation Between Textual Content and Geospatial Locations in Microblogs , 2014, GeoRich'14.

[5]  Elad Hoffer,et al.  On the Blindspots of Convolutional Networks , 2018, ArXiv.

[6]  Mor Naaman,et al.  CityBeat: real-time social media visualization of hyper-local city data , 2014, WWW.

[7]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[8]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[9]  Andreas Züfle,et al.  Emotion predictions in geo-textual data using spatial statistics and recommendation systems , 2019, LocalRec@SIGSPATIAL.

[10]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[11]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[12]  Mor Naaman,et al.  On the Accuracy of Hyper-local Geotagging of Social Media Content , 2014, WSDM.

[13]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[14]  S. Travis Waller,et al.  Utilising Location Based Social Media in Travel Survey Methods: bringing Twitter data into the play , 2015, LBSN@SIGSPATIAL/GIS.

[15]  Roi Reichart,et al.  Neural Structural Correspondence Learning for Domain Adaptation , 2016, CoNLL.

[16]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[17]  Yaron Kanza,et al.  Combined geo-social search: computing top-k join queries over incomplete information , 2018, GeoInformatica.

[18]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[19]  Mohamed F. Mokbel,et al.  VacationFinder: a tool for collecting, analyzing, and visualizing geotagged Twitter data to find top vacation spots , 2014, LBSN '14.

[20]  Roi Reichart,et al.  Pivot Based Language Modeling for Improved Neural Domain Adaptation , 2018, NAACL.

[21]  Dan Klein,et al.  Optimization, Maxent Models, and Conditional Estimation without Magic , 2003, NAACL.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[24]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[25]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[26]  Noah A. Smith,et al.  Linguistic Structured Sparsity in Text Categorization , 2014, ACL.

[27]  Yaron Kanza,et al.  Where's Waldo?: Geosocial Search over Myriad Geotagged Posts , 2017, SIGSPATIAL/GIS.

[28]  Felix Kling,et al.  When a city tells a story: urban topic analysis , 2012, SIGSPATIAL/GIS.

[29]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[32]  Mor Naaman,et al.  Robust detection of hyper-local events from geotagged social media data , 2013, MDMKDD '13.

[33]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[34]  Noah A. Smith,et al.  Making the Most of Bag of Words: Sentence Regularization with Alternating Direction Method of Multipliers , 2014, ICML.

[35]  Stefano Mizzaro,et al.  Content-Based Similarity of Twitter Users , 2015, ECIR.

[36]  Jeffrey Nichols,et al.  Home Location Identification of Twitter Users , 2014, TIST.

[37]  Daniele Quercia,et al.  Tracking "gross community happiness" from tweets , 2012, CSCW.

[38]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[39]  Yi Zhu,et al.  Spatio-temporal sentiment hotspot detection using geotagged photos , 2016, SIGSPATIAL/GIS.

[40]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[41]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[42]  Norman M. Sadeh,et al.  The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City , 2012, ICWSM.

[43]  Krzysztof Janowicz,et al.  Extracting and understanding urban areas of interest using geotagged photos , 2015, Comput. Environ. Urban Syst..

[44]  Hanan Samet,et al.  Detecting latest local events from geotagged tweet streams , 2018, SIGSPATIAL/GIS.

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[47]  Alexander J. Smola,et al.  Hierarchical geographical modeling of user locations from social media posts , 2013, WWW.

[48]  Shawn D. Newsam,et al.  Exploring Geotagged images for land-use classification , 2012, GeoMM '12.

[49]  Hanan Samet,et al.  DeLLe: Detecting Latest Local Events from Geotagged Tweets , 2019, LENS@SIGSPATIAL.

[50]  Granino A. Korn,et al.  Mathematical handbook for scientists and engineers. Definitions, theorems, and formulas for reference and review , 1968 .

[51]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[52]  Mor Naaman,et al.  Geosocial Search: Finding Places based on Geotagged Social-Media Posts , 2015, WWW.

[53]  Yaron Kanza,et al.  City nexus: discovering pairs of jointly-visited locations based on geo-tagged posts in social networks , 2014, SIGSPATIAL/GIS.

[54]  Bálint Kádár Measuring tourist activities in cities using geotagged photography , 2014 .

[55]  Yehoshua Sagiv,et al.  Location-Based Distance Measures for Geosocial Similarity , 2017, ACM Trans. Web.

[56]  Hanan Samet,et al.  An online marketplace for geosocial data , 2015, SIGSPATIAL/GIS.

[57]  Mohamed F. Mokbel,et al.  Demonstration of Taghreed: A system for querying, analyzing, and visualizing geotagged microblogs , 2014, 2015 IEEE 31st International Conference on Data Engineering.

[58]  Yerach Doytsher,et al.  Emotion Maps based on Geotagged Posts in the Social Media , 2017, GeoHumanities@SIGSPATIAL.

[59]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[60]  Hanan Samet,et al.  Finding and Tracking Local Twitter Users for News Detection , 2017, SIGSPATIAL/GIS.