Author Graph Generation based on Author Disambiguation

ABSTRACT While an ideal author graph should have its nodes to represent authors, automatically -generated author graphs mostly use author names as their nodes due to the difficulty of resolving author names into individuals. However, employing aut hor names as nodes of author graphs merges namesakes, otherwise separate nodes in the author graph, into the same node, which may distort the characteristics of the author graph. This study proposes an algorithm which resolves author ambiguities based o n co-authorship and then yields an author graph consisting of not author name nodes but author nodes. Scientific collaboration relationship this algorithm depends on tends to p roduce the clustering results which minimize the over-clustering error at the expense of the under-clustering error. In experiments, the algorithm is applied to the real citation records where Korean namesakes occur, and the results are discussed. KEYWORDS Author Graph, Author Disambiguation, Community Detection

[1]  Dongwon Lee,et al.  On six degrees of separation in DBLP-DB and more , 2005, SGMD.

[2]  Neil R. Smalheiser,et al.  Author name disambiguation , 2009, Annu. Rev. Inf. Sci. Technol..

[3]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[4]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation , 2005, J. Assoc. Inf. Sci. Technol..

[5]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[6]  Yang Song,et al.  Efficient topic-based unsupervised name disambiguation , 2007, JCDL '07.

[7]  Dongwon Lee,et al.  Search engine driven author disambiguation , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[8]  Marco Tomassini,et al.  The structure of the genetic programming collaboration network , 2007, Genetic Programming and Evolvable Machines.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..

[11]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Berthier A. Ribeiro-Neto,et al.  Using web information for author name disambiguation , 2009, JCDL '09.

[13]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Wolfgang Glänzel,et al.  Publication and cooperation patterns of the authors of neuroscience journals , 2004, Scientometrics.

[15]  Johan Bollen,et al.  Co-authorship networks in the digital library research community , 2005, Inf. Process. Manag..

[16]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.