Unsupervised Author Disambiguation using Heterogeneous Graph Convolutional Network Embedding

People share same names in real world. When a digital library user searches for an author name, he may see a mixture of publications by different authors who have the same name. Making distinctions between them is an important prerequisite to improve the quality of services and contents in digital libraries. The general task of author disambiguation is to associate publications which belong to an identical name or names with highly similar spellings to different people entities. In recent years, many researches have been conducted to solve this challenging task. However, some works rely heavily on external knowledge bases and manually annotated data. Some unsupervised learning based works require complex feature engineering. In this paper, we propose a novel and efficient author disambiguation framework which needs no labeled data. We first construct a publication heterogeneous network for each ambiguous name. Then, we use our proposed heterogeneous graph convolutional network embedding method that encodes both graph structure and node attribute information to learn publication representations. After that, we propose a graph enhanced clustering method for name disambiguation that can greatly accelerate the clustering process and need not require the number of distinct persons. Our framework can be continually retrained and applied on incremental disambiguation task when new publications are put in. Experimental results on two datasets show that our framework clearly performs better than several state-of-the-art methods for author disambiguation.

[1]  Neil R. Smalheiser,et al.  Author name disambiguation in MEDLINE , 2009, TKDD.

[2]  Gail L. Rosen,et al.  Incremental Author Name Disambiguation for Scientific Citation Data , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[3]  Tian Pan,et al.  A Multi-Level Author Name Disambiguation Algorithm , 2019, IEEE Access.

[4]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jun Xu,et al.  A Network-embedding Based Method for Author Disambiguation , 2018, CIKM.

[6]  Philip S. Yu,et al.  ADANA: Active Name Disambiguation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[7]  Sohail Asghar,et al.  A survey of author name disambiguation techniques: 2010–2016 , 2017, The Knowledge Engineering Review.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[10]  Yong Tang,et al.  A Novel Approach for Author Name Disambiguation Using Ranking Confidence , 2017, DASFAA Workshops.

[11]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[12]  Marcos André Gonçalves,et al.  On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method , 2015, International Journal on Digital Libraries.

[13]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[14]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[15]  Zhongmin Yan,et al.  Author Name Disambiguation Using Graph Node Embedding Method , 2019, 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[16]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[17]  Qinghua Zheng,et al.  Dynamic author name disambiguation for growing digital libraries , 2015, Information Retrieval Journal.

[18]  Marcos André Gonçalves,et al.  Incremental author name disambiguation by exploiting domain‐specific heuristics , 2017, J. Assoc. Inf. Sci. Technol..

[19]  Jianyong Wang,et al.  On Graph-Based Name Disambiguation , 2011, JDIQ.

[20]  Wei Lu,et al.  Deep Neural Networks for Learning Graph Representations , 2016, AAAI.

[21]  Mohammad Al Hasan,et al.  Name Disambiguation in Anonymized Graphs using Network Embedding , 2017, CIKM.

[22]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[23]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[24]  Shou-De Lin,et al.  Effective string processing and matching for author disambiguation , 2013, KDD Cup '13.

[25]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[26]  Nitesh V. Chawla,et al.  Heterogeneous Graph Neural Network , 2019, KDD.

[27]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[28]  Tien Do,et al.  Author Name Disambiguation by Using Deep Neural Network , 2014, ACIIDS.

[29]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[30]  Alexander J. Smola,et al.  Reducing the sampling complexity of topic models , 2014, KDD.

[31]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[32]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[33]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[34]  Jie Tang,et al.  Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. , 2018, KDD.

[35]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.