Predicting Named Entity Location Using Twitter

A knowledge base contains a set of concepts, entities, attributes, and relations. Knowledge bases are increasingly critical to a wide variety of applications in both industry and academia. Yet despite all that, knowledge bases are greatly incomplete. As the world evolves, new entities are generated. Enriching existing knowledge bases with new entities and new location attribute values for them becomes more and more important. Twitter is one of the most popular micro-blogging platforms. Named entities are mentioned frequently in the huge collection of tweets which contain abundant geographical location knowledge. Given a named entity and a set of tweets where the entity appears, we are interested in predicting the entity city-level location using the knowledge embedded in tweets. This task is helpful for many applications such as knowledge base enrichment, tweet location prediction, and entity search. In this paper we propose NELPT, the first unsupervised framework for Named Entity city-level Location Prediction by leveraging the geographical location knowledge from Twitter. This framework leverages a Linear Neural Network model as the predictive model combining two categories of information: (1) local count information; (2) global distributional information. A learning algorithm based on the expectation-maximization (EM) method is proposed to automatically learn the parameters of the Linear Neural Network predictive model without requiring any training data. The experimental results on a real world Twitter data set show that our framework significantly outperforms the baselines in terms of accuracy, and scales very well.

[1]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[2]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[3]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[4]  Lei Zou,et al.  Mapping Entity-Attribute Web Tables to Web-Scale Knowledge Bases , 2013, DASFAA.

[5]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[6]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[7]  Gerhard Weikum,et al.  Discovering emerging entities with ambiguous names , 2014, WWW.

[8]  Timothy Baldwin,et al.  Geolocation Prediction in Social Media Data by Finding Location Indicative Words , 2012, COLING.

[9]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[10]  Timothy Baldwin,et al.  A Stacking-based Approach to Twitter User Geolocation Prediction , 2013, ACL.

[11]  Mor Naaman,et al.  On the Accuracy of Hyper-local Geotagging of Social Media Content , 2014, WSDM.

[12]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[13]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[14]  Yong Yu,et al.  Collaborative personalized tweet recommendation , 2012, SIGIR '12.

[15]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[16]  Yan Huang,et al.  Closeness and Structure of Friends Help to Estimate User Locations , 2016, DASFAA.

[17]  Munindar P. Singh,et al.  Percimo: A personalized community model for location estimation in social media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[18]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[20]  Jeffrey Nichols,et al.  Where Is This Tweet From? Inferring Home Locations of Twitter Users , 2012, ICWSM.

[21]  Gerhard Weikum,et al.  The Knowledge Awakens: Keeping Knowledge Bases Fresh with Emerging Entities , 2016, WWW.

[22]  Nadia Magnenat-Thalmann,et al.  Who, where, when and what: discover spatio-temporal topics for twitter users , 2013, KDD.

[23]  Bruno Martins,et al.  Automated Geocoding of Textual Documents: A Survey of Current Approaches , 2017, Trans. GIS.

[24]  Yan Huang,et al.  Where are You Tweeting?: A Context and User Movement Based Approach , 2016, CIKM.

[25]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[26]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[27]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[28]  Jinfei Liu Multiple Location Profiling for Users and Relationships from Social Network and Content , 2013 .

[29]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[30]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[31]  Wei Shen,et al.  A graph-based approach for ontology population with named entities , 2012, CIKM '12.

[32]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[33]  Davood Rafiei,et al.  Geotagging Named Entities in News and Online Documents , 2016, CIKM.

[34]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[35]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[36]  Jon M. Kleinberg,et al.  Spatial variation in search engine queries , 2008, WWW.

[37]  Jeffrey Nichols,et al.  Home Location Identification of Twitter Users , 2014, TIST.