Dirichlet Process Mixtures Model Based on Variational Inference for Chinese Person Name Disambiguation

Person name ambiguity in Web search results is a very common phenomenon. Although many methods have been proposed for solving the problem of person name ambiguity, their accuracy still must be enhanced in the complex and heterogeneous webpages. We introduce a variational inference algorithm for the Dirichlet process mixtures model (DPMM) for text clustering to disambiguate person name. Experiments on web data from different search engines indicate that our approach consistently outperforms other clustering methods such as K-means clustering and agglomerative hierarchical clustering.

[1]  Li Li,et al.  Entity linking and name disambiguation using SVM in Chinese micro-blogs , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[2]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[3]  Cheng Niu,et al.  Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction , 2004, ACL.

[4]  Gary Marchionini,et al.  A study on video browsing strategies , 1997 .

[5]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[6]  Ying Chen,et al.  Towards Robust Unsupervised Personal Name Disambiguation , 2007, EMNLP-CoNLL.

[7]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[8]  O. Zobay Mean field inference for the Dirichlet process mixture model , 2009 .

[9]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[10]  Peng Jin,et al.  Exploring Word Similarity to Improve Chinese Personal Name Disambiguation , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[11]  Hiroshi Nakagawa,et al.  Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics , 2008, PAKDD.

[12]  Eduard Hovy,et al.  Multi-Document Person Name Resolution , 2004 .

[13]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[14]  Mohammad Hossein Nadimi,et al.  A more Accurate Clustering Method by using Co-author Social Networks for Author Name Disambiguation , 2015 .

[15]  Yang Liu,et al.  Research on Webpage Similarity Computing Technology Based on Visual Blocks , 2014, SMP.