Populating knowledge base with collective entity mentions: A graph-based approach

Populating a knowledge base with new entity mentions extracted from unstructured text can help enhance its coverage and freshness. It naturally consists of two subtasks, namely, fine-grained entity classification and entity linking. Existing studies often focus on one of these two subtasks and they usually populate entity mentions in the same text by implicitly assuming that they are independent. However, these entity mentions are often semantically related to each other and it would be better to populate them into the knowledge base collectively. For solving these problems, in this paper we propose an interdependence graph based and unified collective inference approach, called CIIGA, to populating a knowledge base with collective entities, which can jointly determine the proper locations of all entity mentions in the same text by exploiting their interdependence relationships. Experimental results show that this approach can achieve significant accuracy improvement, as compared to the baseline approach, APOLLO, on the task of knowledge base population with multiple entities.

[1]  F. Göbel,et al.  Random walks on graphs , 1974 .

[2]  J. Delvenne,et al.  Random walks on graphs , 2004 .

[3]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[4]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[5]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[6]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[7]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[8]  Rares Vernica,et al.  Entity categorization over large document collections , 2008, KDD.

[9]  Jing Jiang,et al.  Linking Entities to a Knowledge Base with Query Expansion , 2011, EMNLP.

[10]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[11]  Wei Shen,et al.  A graph-based approach for ontology population with named entities , 2012, CIKM '12.

[12]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[13]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[14]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[15]  Gerhard Weikum,et al.  Fine-grained Semantic Typing of Emerging Entities , 2013, ACL.

[16]  Feng Kai,et al.  Structural-interaction link prediction in microblogs , 2013, WWW 2013.

[17]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.

[18]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[19]  Kai Feng,et al.  Structural-interaction link prediction in microblogs , 2013, WWW 2013.

[20]  Guandong Xu 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014, Beijing, China, August 17-20, 2014 , 2014 .