End-to-end Network for Twitter Geolocation Prediction and Hashing

We propose an end-to-end neural network to predict the geolocation of a tweet. The network takes as input a number of raw Twitter metadata such as the tweet message and associated user account information. Our model is language independent, and despite minimal feature engineering, it is interpretable and capable of learning location indicative words and timing patterns. Compared to state-of-the-art systems, our model outperforms them by 2%-6%. Additionally, we propose extensions to the model to compress representation learnt by the network into binary codes. Experiments show that it produces compact codes compared to benchmark hashing algorithms. An implementation of the model is released publicly.

[1]  Timothy Baldwin,et al.  Geolocation Prediction in Social Media Data by Finding Location Indicative Words , 2012, COLING.

[2]  Timothy Baldwin,et al.  Exploiting Text and Network Context for Geolocation of Social Media Users , 2015, NAACL.

[3]  David J. Fleet,et al.  Fast Exact Search in Hamming Space With Multi-Index Hashing , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[6]  Timothy Baldwin,et al.  Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text , 2016, NUT@COLING.

[7]  Steven Schockaert,et al.  Spatially Aware Term Selection for Geotagging , 2014, IEEE Transactions on Knowledge and Data Engineering.

[8]  Derek Ruths,et al.  Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice , 2015, ICWSM.

[9]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[10]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11]  Tomoki Taniguchi,et al.  A Simple Scalable Neural Networks based Model for Geolocation Prediction in Twitter , 2016, NUT@COLING.

[12]  Haixun Wang,et al.  Understanding Short Texts through Semantic Enrichment and Hashing , 2016, IEEE Transactions on Knowledge and Data Engineering.

[13]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[14]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Xingquan Zhu,et al.  Hashing Techniques , 2017 .

[17]  Stephen Wan,et al.  CSIRO Data61 at the WNUT Geo Shared Task , 2016, NUT@COLING.

[18]  Kwan Hui Lim,et al.  Geolocation Prediction in Twitter Using Location Indicative Words and Textual Features , 2016, NUT@COLING.

[19]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[20]  M. Williams,et al.  Knowing the Tweeters: Deriving Sociologically Relevant Demographics from Twitter , 2013 .

[21]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[22]  David Suter,et al.  Fast Supervised Hashing with Decision Trees for High-Dimensional Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[24]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.