Exploring Multiple Feature Spaces for Novel Entity Discovery

Continuously discovering novel entities in news and Web data is important for Knowledge Base (KB) maintenance. One of the key challenges is to decide whether an entity mention refers to an in-KB or out-of-KB entity. We propose a principled approach that learns a novel entity classifier by modeling mention and entity representation into multiple feature spaces, including contextual, topical, lexical, neural embedding and query spaces. Different from most previous studies that address novel entity discovery as a submodule of entity linking systems, our model is more a generalized approach and can be applied as a pre-filtering step of novel entities for any entity linking systems. Experiments on three real-world datasets show that our method significantly outperforms existing methods on identifying novel entities.

[1]  Giuseppe Ottaviano,et al.  Fast and Space-Efficient Entity Linking for Queries , 2015, WSDM.

[2]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[3]  Gerhard Weikum,et al.  Discovering emerging entities with ambiguous names , 2014, WWW.

[4]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[5]  Daniel S. Weld,et al.  Fine-Grained Entity Recognition , 2012, AAAI.

[6]  Ben Hachey,et al.  Entity Disambiguation with Web Links , 2015, TACL.

[7]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[8]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[9]  Oren Etzioni,et al.  No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities , 2012, EMNLP.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[12]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[13]  Ben Hachey,et al.  Overview of TAC-KBP2014 Entity Discovery and Linking Tasks , 2015 .

[14]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.

[15]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[16]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[17]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[18]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[19]  Xiaolong Wang,et al.  Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation , 2015, IJCAI.

[20]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[21]  Jian Su,et al.  Entity Linking with Effective Acronym Expansion, Instance Selection, and Topic Modeling , 2011, IJCAI.

[22]  Paul N. Bennett,et al.  Predicting Query Performance via Classification , 2010, ECIR.

[23]  Silviu Cucerzan Name entities made obvious: the participation in the ERD 2014 evaluation , 2014, ERD '14.

[24]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[25]  Paul N. Bennett,et al.  Refined experts: improving classification in large taxonomies , 2009, SIGIR.

[26]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[27]  G. Prasad LEARNING TO LINK ENTITIES WITH KNOWLEDGE BASE , 2016 .

[28]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[29]  Gerhard Weikum,et al.  Fine-grained Semantic Typing of Emerging Entities , 2013, ACL.

[30]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[31]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[32]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[33]  Jing Jiang,et al.  Linking Entities to a Knowledge Base with Query Expansion , 2011, EMNLP.

[34]  Silviu Cucerzan MSR System for Entity Linking at TAC 2012 , 2012, TAC.

[35]  Houfeng Wang,et al.  Learning Entity Representation for Entity Disambiguation , 2013, ACL.

[36]  Frank Klawonn,et al.  Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points , 2003, IDA.

[37]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[38]  Xin Li,et al.  Constraint-Based Entity Matching , 2005, AAAI.

[39]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .