Towards Deep Semantic Analysis of Hashtags

Hashtags are semantico-syntactic constructs used across various social networking and microblogging platforms to enable users to start a topic specific discussion or classify a post into a desired category. Segmenting and linking the entities present within the hashtags could therefore help in better understanding and extraction of information shared across the social media. However, due to lack of space delimiters in the hashtags (e.g #nsavssnowden), the segmentation of hashtags into constituent entities (“NSA” and “Edward Snowden” in this case) is not a trivial task. Most of the current state-of-the-art social media analytics systems like Sentiment Analysis and Entity Linking tend to either ignore hashtags, or treat them as a single word. In this paper, we present a context aware approach to segment and link entities in the hashtags to a knowledge base (KB) entry, based on the context within the tweet. Our approach segments and links the entities in hashtags such that the coherence between hashtag semantics and the tweet is maximized. To the best of our knowledge, no existing study addresses the issue of linking entities in hashtags for extracting semantic information. We evaluate our method on two different datasets, and demonstrate the effectiveness of our technique in improving the overall entity linking in tweets via additional semantic information provided by segmenting and linking entities in a hashtag.

[1]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[2]  Gabriella Pasi,et al.  Short-text domain specific key terms/phrases extraction using an n-gram model with wikipedia , 2012, CIKM.

[3]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[4]  Vasudeva Varma,et al.  CharBoxes: a system for automatic discovery of character infoboxes from books , 2014, SIGIR.

[5]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[6]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[7]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[8]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[9]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[10]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[11]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[12]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[13]  Ikuya Yamada,et al.  Evaluating the helpfulness of linked entities to readers , 2014, HT.

[14]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[15]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[16]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[17]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[18]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[19]  Dan Klein,et al.  Mention Detection: Heuristics for the OntoNotes annotations , 2011, CoNLL Shared Task.

[20]  Wei Shen,et al.  LIEGE:: link entities in web lists with knowledge base , 2012, KDD.

[21]  Wei Shen,et al.  A graph-based approach for ontology population with named entities , 2012, CIKM '12.

[22]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[23]  Avirup Sil,et al.  Re-ranking for joint named-entity recognition and linking , 2013, CIKM.

[24]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[25]  Raymond J. Mooney and Paul N. Bennett and Loriene Roy,et al.  Book Recommending Using Text Categorization with Extracted Information , 1998 .

[26]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[27]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[28]  Wouter Weerkamp,et al.  How people use Twitter in different languages , 2011 .

[29]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[30]  Milan Dojchinovski,et al.  Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia , 2013, ECML/PKDD.

[31]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[32]  Jianyong Wang,et al.  GRIAS: An Entity-Relation Graph Based Framework for Discovering Entity Aliases , 2013, 2013 IEEE 13th International Conference on Data Mining.

[33]  Vasudeva Varma,et al.  EDIUM: Improving Entity Disambiguation via User Modeling , 2014, ECIR.

[34]  Gerhard Weikum,et al.  AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables , 2011, Proc. VLDB Endow..

[35]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[36]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[37]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[38]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[39]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[40]  Tibebe Beshah,et al.  Mining Road Traffic Accident Data to Improve Safety: Role of Road-Related Factors on Accident Severity in Ethiopia , 2010, AAAI Spring Symposium: Artificial Intelligence for Development.

[41]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[42]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[43]  Dekang Lin,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2 , 2011 .

[44]  Miles Efron,et al.  Hashtag retrieval in a microblogging environment , 2010, SIGIR.

[45]  Min-Yen Kan,et al.  Fast webpage classification using URL features , 2005, CIKM '05.

[46]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[47]  Houfeng Wang,et al.  Learning Entity Representation for Entity Disambiguation , 2013, ACL.

[48]  Evgeny V. Morozov,et al.  Iran: Downside to the "Twitter Revolution" , 2009 .

[49]  James R. Curran,et al.  Graph-Based Named Entity Linking with Wikipedia , 2011, WISE.

[50]  Zhao Hai,et al.  Chinese Word Segmentation: A Decade Review , 2007 .

[51]  Sanjeet Khaitan,et al.  Data-driven compound splitting method for english compounds in domain names , 2009, CIKM.

[52]  Bertrand De Longueville,et al.  "OMG, from here, I can see the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires , 2009, LBSN '09.

[53]  Jianyong Wang,et al.  We can learn your #hashtags: Connecting tweets to explicit topics , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[54]  Aba-Sah Dadzie,et al.  Making Sense of Microposts (#Microposts2014) Named Entity Extraction & Linking Challenge , 2014, #MSM.

[55]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[56]  Kuansan Wang,et al.  Web scale NLP: a case study on url word breaking , 2011, WWW.

[57]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[58]  Rudrasis Chakraborty,et al.  Segmenting web-domains and hashtags using length specific models , 2012, CIKM '12.

[59]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[60]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[61]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[62]  Salvatore Orlando,et al.  Learning relatedness measures for entity linking , 2013, CIKM.

[63]  Wanxiang Che,et al.  A Graph-based Method for Entity Linking , 2011, IJCNLP.

[64]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.