Automatic Identification of Expressions of Locations in Tweet Messages using Conditional Random Fields

In this paper, we propose an automatic identification model, capable of extracting expressions of locations (EoLs) within Twitter messages. Moreover, we participated in the competition of ALTA Shared Task 2014 and our best-performing system is ranked among the top 3 systems (2nd in the public leaderboard). In our model, we explored the validity of the use of a wide variety of lexical, structural and geospatial features as well as a machine learning model Conditional Random Fields (CRF). Further, we investigated the effectiveness of stacking and self-training.

[1]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[2]  Li Wang,et al.  How Noisy Social Media Text, How Diffrnt Social Media Sources? , 2013, IJCNLP.

[3]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[4]  Timothy Baldwin,et al.  Automatic Identification of Locative Expressions from Social Media Text: A Comparative Analysis , 2014, LocWeb '14.

[5]  Jie Yin,et al.  Using Social Media to Enhance Emergency Situation Awareness , 2012, IEEE Intelligent Systems.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Michael Gamon,et al.  Proceedings of the Workshop on Language in Social Media (LSM 2011) , 2011 .

[8]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[9]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[10]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[11]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[12]  Sharon Myrtle Paradesi,et al.  Geotagging Tweets Using Their Content , 2011, FLAIRS.

[13]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[14]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[15]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[16]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[17]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[18]  Laura Díaz,et al.  Discovery and Integration of Web 2.0 Content into Geospatial Information Infrastructures: A Use Case in Wild Fire Monitoring , 2011, ARES.

[19]  Hila Becker,et al.  Event Identification in Social Media , 2009, WebDB.

[20]  Fei Liu Automatic identification of locative expressions from informal text , 2013 .

[21]  Linda L. Hill,et al.  Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints , 2000, ECDL.

[22]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[23]  Tracy L. Tuten Advertising 2.0 , 2008 .

[24]  Timothy Baldwin,et al.  Automatically Constructing a Normalisation Dictionary for Microblogs , 2012, EMNLP.

[25]  Judith Gelernter,et al.  An algorithm for local geoparsing of microtext , 2013, GeoInformatica.

[26]  Dave Evans,et al.  Social Media Marketing: An Hour a Day , 2008 .

[27]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[28]  Maria Vasardani Automatic Identification of Locative Expressions from Informal Text , 2013 .

[29]  Trevor Cohn,et al.  Trendminer: An Architecture for Real Time Analysis of Social Media Text , 2012, ICWSM 2012.

[30]  Jie Yin,et al.  Location extraction from disaster-related microblogs , 2013, WWW.

[31]  Akshay Java A Framework for Modeling Influence, Opinions and Structure in Social Media , 2007, AAAI.

[32]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.