论文信息 - Joint Recognition and Linking of Fine-Grained Locations from Tweets - 字舞流文

Joint Recognition and Linking of Fine-Grained Locations from Tweets

Many users casually reveal their locations such as restaurants, landmarks, and shops in their tweets. Recognizing such fine-grained locations from tweets and then linking the location mentions to well-defined location profiles (e.g., with formal name, detailed address, and geo-coordinates etc.) offer a tremendous opportunity for many applications. Different from existing solutions which perform location recognition and linking as two sub-tasks sequentially in a pipeline setting, in this paper, we propose a novel joint framework to perform location recognition and location linking simultaneously in a joint search space. We formulate this end-to-end location linking problem as a structured prediction problem and propose a beam-search based algorithm. Based on the concept of multi-view learning, we further enable the algorithm to learn from unlabeled data to alleviate the dearth of labeled data. Extensive experiments are conducted to recognize locations mentioned in tweets and link them to location profiles in Foursquare. Experimental results show that the proposed joint learning algorithm outperforms the state-of-the-art solutions, and learning from unlabeled data improves both the recognition and linking accuracy.

Gao Cong | Aixin Sun | Jialong Han | Zongcheng Ji | G. Cong | Aixin Sun | Jialong Han | Zongcheng Ji

[1] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[2] Max Mühlhäuser,et al. A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[3] Brendan T. O'Connor,et al. A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[4] Chenliang Li,et al. Fine-grained location extraction from tweets with temporal awareness , 2014, SIGIR.

[5] Wei Shen,et al. LIEGE:: link entities in web lists with knowledge base , 2012, KDD.

[6] Silviu Cucerzan,et al. Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[7] Mor Naaman,et al. On the Accuracy of Hyper-local Geotagging of Social Media Content , 2014, WSDM.

[8] Philip S. Yu,et al. Inferring crowd-sourced venues for tweets , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[9] Yang Guo,et al. Structured Perceptron with Inexact Search , 2012, NAACL.

[10] Avirup Sil,et al. Re-ranking for joint named-entity recognition and linking , 2013, CIKM.

[11] Sheila Kinsella,et al. "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[12] Virginia R. de Sa,et al. Learning Classification with Unlabeled Data , 1993, NIPS.

[13] Heng Ji,et al. Collective Tweet Wikification based on Semi-supervised Graph Regularization , 2014, ACL.

[14] Ed H. Chi,et al. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[15] Sanjoy Dasgupta,et al. PAC Generalization Bounds for Co-training , 2001, NIPS.

[16] Fahad Bin Muhaya,et al. Estimating Twitter User Location Using Social Interactions--A Content Based Approach , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[17] Wei Shen,et al. Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[18] Brian Roark,et al. Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[19] Stephen Clark,et al. Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[20] Yitong Li,et al. Entity Linking for Tweets , 2013, ACL.

[21] Ulf Brefeld,et al. Multi-view Discriminative Sequential Learning , 2005, ECML.

[22] Heng Ji,et al. Joint Event Extraction via Structured Prediction with Global Features , 2013, ACL.

[23] Dan Roth,et al. Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[24] Jie Yin,et al. Location extraction from disaster-related microblogs , 2013, WWW.

[25] Martha Larson,et al. The where in the tweet , 2011, CIKM '11.

[26] Adam Rae,et al. Mining the web for points of interest , 2012, SIGIR '12.

[27] Jie Yin,et al. Pinpointing Locational Focus in Microblogs , 2014, ADCS.

[28] Ian H. Witten,et al. Learning to link with wikipedia , 2008, CIKM '08.

[29] Ron Sivan,et al. Web-a-where: geotagging web content , 2004, SIGIR '04.

[30] Dongwon Lee,et al. @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[31] Gang Luo,et al. Joint Named Entity Recognition and Disambiguation , 2015 .

[32] Ming-Wei Chang,et al. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[33] Zaiqing Nie,et al. Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[34] Stuart E. Middleton,et al. Real-Time Crisis Mapping of Natural Disasters Using Social Media , 2014, IEEE Intelligent Systems.

[35] Rada Mihalcea,et al. Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[36] Heng Ji,et al. Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[37] Mudhakar Srivatsa,et al. When twitter meets foursquare: tweet location prediction using foursquare , 2014, MobiQuitous.

[38] William W. Cohen,et al. Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[39] M. de Rijke,et al. Adding semantics to microblog posts , 2012, WSDM '12.

[40] Diana Inkpen,et al. Detecting and Disambiguating Locations Mentioned in Twitter Messages , 2015, CICLing.

[41] Steven P. Abney,et al. Bootstrapping , 2002, ACL.

[42] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[43] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[44] Sharon Myrtle Paradesi,et al. Geotagging Tweets Using Their Content , 2011, FLAIRS.

[45] Paolo Ferragina,et al. TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.