Wikification and Beyond: The Challenges of Entity and Concept Grounding

Contextual disambiguation and grounding of concepts and entities in natural language are essential to progress in many natural language understanding tasks and fundamental to many applications. Wikification aims at automatically identifying concept mentions in text and linking them to referents in a knowledge base (KB) (e.g., Wikipedia). Consider the sentence, "The Times report on Blumenthal (D) has the potential to fundamentally reshape the contest in the Nutmeg State.". A Wikifier should identify the key entities and concepts and map them to an encyclopedic resource (e.g., “D” refers to Democratic Party, and “the Nutmeg State” refers to Connecticut. Wikification benefits end-users and Natural Language Processing (NLP) systems. Readers can better comprehend Wikified documents as information about related topics is readily accessible. For systems, a Wikified document elucidates concepts and entities by grounding them in an encyclopedic resource or an ontology. Wikification output has improved NLP down-stream tasks, including coreference resolution, user interest discovery , recommendation and search. This task has received increased attention in recent years from the NLP and Data Mining communities, partly fostered by the U.S. NIST Text Analysis Conference Knowledge Base Population (KBP) track, and several versions of it has been studied. These include Wikifying all concept mentions in a single text document; Wikifying a cluster of co-referential named entity mentions that appear across documents (Entity Linking), and Wikifying a whole document to a single concept. Other works relate this task to coreference resolution within and across documents and in the context of multiple text genres. 2 Content Overview

[1]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[2]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[3]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[4]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[5]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[6]  Timothy W. Finin,et al.  Wikipedia as an Ontology for Describing Documents , 2008, ICWSM.

[7]  Ming-Wei Chang,et al.  Importance of Semantic Representation: Dataless Classification , 2008, AAAI.

[8]  Patrick Pantel,et al.  Entity Extraction via Ensemble Semantics , 2009, EMNLP.

[9]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[10]  Ziqi Zhang,et al.  Graph-based Semantic Relatedness for Named Entity Disambiguation , 2009 .

[11]  Weiyi Meng,et al.  A Latent Topic Model for Complete Entity Resolution , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[12]  Rohini K. Srihari,et al.  Cross document person name disambiguation using entity profiles , 2009, TAC.

[13]  Ian H. Witten,et al.  Mining Meaning from Wikipedia , 2008, Int. J. Hum. Comput. Stud..

[14]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[15]  Paul McNamee,et al.  An Evaluation of Technologies for Knowledge Base Population , 2010, LREC.

[16]  Valentin I. Spitkovsky,et al.  Stanford-UBC Entity Linking at TAC-KBP , 2010, TAC.

[17]  Xiang Li,et al.  CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description , 2010, TAC.

[18]  Paloma Martínez,et al.  Combining similarities with regression based classifiers for Entity Linking at TAC 2010 , 2010, TAC.

[19]  Lan Nie,et al.  Resolving Surface Forms to Wikipedia Topics , 2010, COLING.

[20]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[21]  Yan Li,et al.  PRIS at TAC2010 KBP Track , 2010, TAC.

[22]  Ying Shi,et al.  LCC Approaches to Knowledge Base Population at TAC 2010 , 2010, TAC.

[23]  Norberto Fernández García,et al.  WebTLab: A cooccurrence-based approach to KBP 2010 Entity-Linking task , 2010, TAC.

[24]  Dávid Márk Nemeskey,et al.  BUDAPESTACAD at TAC 2010 , 2010, TAC.

[25]  Jing Jiang,et al.  SMU-SIS at TAC 2010 - KBP Track Entity Linking , 2010, TAC.

[26]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[27]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[28]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[29]  Vasudeva Varma,et al.  IIIT Hyderabad in Guided Summarization and Knowledge Base Population , 2010, TAC.

[30]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[31]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[32]  Zornitsa Kozareva,et al.  Unsupervised Name Ambiguity Resolution Using A Generative Model , 2011, ULNLP@EMNLP.

[33]  Jian Su,et al.  Entity Linking with Effective Acronym Expansion, Instance Selection, and Topic Modeling , 2011, IJCAI.

[34]  Pável Calado,et al.  Supervised Learning for Linking Named Entities to Knowledge Base Entries , 2011, TAC.

[35]  Heng Ji,et al.  CUNY-UIUC-SRI TAC-KBP2011 Entity Linking System Description , 2011, TAC.

[36]  Qing Yang,et al.  Discovering User Interest on Twitter with a Modified Author-Topic Model , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[37]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[38]  Michael Strube,et al.  HITS' Cross-lingual Entity Linking System at TAC 2011: One Model for All Languages , 2011, TAC.

[39]  Jing Jiang,et al.  Linking Entities to a Knowledge Base with Query Expansion , 2011, EMNLP.

[40]  Silviu Cucerzan,et al.  TAC Entity Linking by Performing Full-document Entity Extraction and Disambiguation , 2011, TAC.

[41]  James R. Curran,et al.  Graph-Based Named Entity Linking with Wikipedia , 2011, WISE.

[42]  Wanxiang Che,et al.  A Graph-based Method for Entity Linking , 2011, IJCNLP.

[43]  Heng Ji,et al.  Collaborative Ranking: A Case Study on Entity Linking , 2011, EMNLP.

[44]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.

[45]  Douglas W. Oard,et al.  Cross-Language Entity Linking , 2011, IJCNLP.

[46]  Zornitsa Kozareva,et al.  Class Label Enhancement via Related Instances , 2011, EMNLP.

[47]  Jian Su,et al.  A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection , 2011, IJCNLP.

[48]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[49]  Sean Monahan,et al.  Cross-Lingual Cross-Document Coreference with Entity Linking , 2011, TAC.

[50]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[51]  Jeffrey V. Nickerson,et al.  Discovering Context: Classifying Tweets through a Semantic Transform Based on Wikipedia , 2011, HCI.

[52]  M. de Rijke,et al.  Generating links to background knowledge: a case study using narrative radiology reports , 2011, CIKM '11.

[53]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[54]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[55]  Gerhard Paass,et al.  From names to entities using thematic context distance , 2011, CIKM '11.

[56]  Heng Ji,et al.  Analysis and Enhancement of Wikification for Microblogs with Context Expansion , 2012, COLING.

[57]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[58]  Alicia Ageno,et al.  The TALP participation at TAC-KBP 2012 , 2012, TAC.

[59]  Surajit Chaudhuri,et al.  Targeted disambiguation of ad-hoc, homogeneous sets of named entities , 2012, WWW.

[60]  Avirup Sil,et al.  Linking Named Entities to Any Database , 2012, EMNLP.

[61]  Wagner Meira,et al.  Named Entity Disambiguation in Streaming Data , 2012, ACL.

[62]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[63]  Michael Strube,et al.  HITS' Monolingual and Cross-lingual Entity Linking System at TAC 2012: A Joint Approach , 2012, TAC.

[64]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[65]  Jie Liu,et al.  NLPComp in TAC 2012 Entity Linking and Slot-Filling , 2012, TAC.

[66]  Dan Roth,et al.  Learning-based Multi-Sieve Co-reference Resolution with Knowledge , 2012, EMNLP-CoNLL.

[67]  Oren Etzioni,et al.  No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities , 2012, EMNLP.

[68]  Edward Y. Chang,et al.  Entity Disambiguation with Freebase , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[69]  Paolo Ferragina,et al.  Classification of Short Texts by Deploying Topical Annotations , 2012, ECIR.

[70]  Laura Dietz,et al.  Across-Document Neighborhood Expansion: UMass at TAC KBP 2012 Entity Linking , 2012, TAC.

[71]  Mark Stevenson,et al.  Adapting Wikification to Cultural Heritage , 2012, LaTeCH@EACL.

[72]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[73]  Tru H. Cao,et al.  JVN-TDT Entity Linking Systems at TAC-KBP2012 , 2012, TAC.

[74]  Erdogan Dogdu,et al.  Named entity recognition and disambiguation using linked data and graph-based centrality scoring , 2012, SWIM '12.

[75]  Zhaochen Guo,et al.  ualberta at TAC-KBP 2012: English and Cross-Lingual Entity Linking , 2012, TAC.

[76]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[77]  Laura Dietz,et al.  A neighborhood relevance model for entity linking , 2013, OAIR.

[78]  Houfeng Wang,et al.  Learning Entity Representation for Entity Disambiguation , 2013, ACL.

[79]  Jun Zhao,et al.  The CASIA Entity linking System at TAC 2013 , 2013, TAC.

[80]  Marie-Jean Meurs,et al.  SemLinker system for KBP2013: A disambiguation algorithm based on mutual relations of semantic annotations inside a document , 2013, TAC.

[81]  Yuqin Li,et al.  Improving Candidate Generation for Entity Linking , 2013, NLDB.

[82]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[83]  Yao Meng,et al.  FRDC's Cross-lingual Entity Linking System at TAC 2013 , 2013, TAC.

[84]  Laura Dietz,et al.  UMass CIIR at TAC KBP 2013 Entity Linking: Query Expansion using Urban Dictionary , 2013, TAC.

[85]  Heng Ji,et al.  Resolving Entity Morphs in Censored Data , 2013, ACL.

[86]  Heng Ji,et al.  RPI-BLENDER TAC-KBP2013 Knowledge Base Population System , 2013, TAC.

[87]  Yang Song,et al.  Efficient Collective Entity Linking with Stacking , 2013, EMNLP.

[88]  Juan-Zi Li,et al.  Boosting Cross-Lingual Knowledge Linking via Concept Annotation , 2013, IJCAI.

[89]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.

[90]  Benjamin Heinzerling,et al.  HITS' Monolingual and Cross-lingual Entity Linking System at TAC 2013 , 2013, TAC.

[91]  Salvatore Orlando,et al.  Learning relatedness measures for entity linking , 2013, CIKM.

[92]  Dan Roth,et al.  Relational Inference for Wikification , 2013, EMNLP.

[93]  Yitong Li,et al.  Entity Linking for Tweets , 2013, ACL.

[94]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[95]  Tsar,et al.  Resolving Entity Morphs in Censored Data , 2013 .

[96]  Avirup Sil,et al.  Re-ranking for joint named-entity recognition and linking , 2013, CIKM.

[97]  G. Prasad LEARNING TO LINK ENTITIES WITH KNOWLEDGE BASE , 2016 .