A Large-scale Test Set for Author Disambiguation

To overcome article-oriented search functions and provide author-oriented ones, a namesake problem for author names should be solved. Author disambiguation, proposed as its solution, assigns identifiers of real individuals to author name entities. Although recent state-of-the-art approaches to author disambiguation have reported above 90% performance, there are few academic information services which adopt author-resolving functions. This paper describes a large-scale test set for author disambiguation which was created by KISTI to foster author resolution researches. The result of these researches can be applied to academic information systems and make better service. The test set was constructed from DBLP data through web searches and manual inspection, Currently it consists of 881 author names, 41,673 author name entities, and 6,921 person identifiers.

[1]  Dongwon Lee,et al.  Search engine driven author disambiguation , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[2]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..

[3]  Nigel Shadbolt,et al.  Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[4]  Yang Song,et al.  Efficient topic-based unsupervised name disambiguation , 2007, JCDL '07.

[5]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).