Name Disambiguation Using Semi-supervised Topic Model

Name ambiguity is increasingly attracting more attention. With the development of information available on the Web, name disambiguation is becoming one of the most challenging tasks. For example, some persons may share the same personal name. In order to address this problem, topic coherence principle is used to eliminate ambiguity of the name entity. A semi-supervised topic model (STM) is proposed. When we search online, many irrelevant documents always return to users. Wikipedia hierarchical structure information enrich the semantics of the name entity. Information extracted from Wikipedia is sorted out and put in the knowledge base. It is used to match the query entity. By utilizing the context of the given query entity, we attempt to disambiguate various meanings with the proposed model. Experiments on two real-life datasets, show that STM is more superior than baselines (ETM and WPAM) with accuracy 84.75 %. The result shows that our method is promising in name disambiguation as well. Our work can provide invaluable insights into entity disambiguation.

[1]  Ying Chen,et al.  Towards Robust Unsupervised Personal Name Disambiguation , 2007, EMNLP-CoNLL.

[2]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[3]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[4]  Jan-Ming Ho,et al.  Disambiguating authors in citations on the web and authorship correlations , 2012, Expert Syst. Appl..

[5]  Sunghae Jun,et al.  Document clustering method using dimension reduction and support vector clustering to overcome sparseness , 2014, Expert Syst. Appl..

[6]  Yue Lu,et al.  Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA , 2011, Information Retrieval.

[7]  Prithviraj Sen,et al.  Collective context-aware topic models for entity disambiguation , 2012, WWW.

[8]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[9]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.

[10]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[11]  Yong Shi,et al.  Entity Disambiguation with Textual and Connection Information , 2012, ICCS.

[12]  Juan-Zi Li,et al.  A constraint-based topic modeling approach for name disambiguation , 2009, Frontiers of Computer Science in China.

[13]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[14]  Ted Pedersen,et al.  Name Discrimination by Clustering Similar Contexts , 2005, CICLing.

[15]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[16]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..

[17]  Ruben Heradio,et al.  Understanding the role of conceptual relations in Word Sense Disambiguation , 2011, Expert Syst. Appl..

[18]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[19]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[20]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[21]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[22]  Tru H. Cao,et al.  A Knowledge-Based Approach to Named Entity Disambiguation in News Articles , 2007, Australian Conference on Artificial Intelligence.

[23]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[24]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.