Pairwise Learning for Name Disambiguation in Large-Scale Heterogeneous Academic Networks

Name disambiguation aims to identify unique authors with the same name. Existing name disambiguation methods always exploit author attributes to enhance disambiguation results. However, some discriminative author attributes (e.g., email and affiliation) may change because of graduation or job-hopping, which will result in the separation of the same author's papers in digital libraries. Although these attributes may change, an author's co-authors and research topics do not change frequently with time, which means that papers within a period have similar text and relation information in the academic network. Inspired by this idea, we introduce Multi-view Attention-based Pairwise Recurrent Neural Network (MA-PairRNN) to solve the name disambiguation problem. We divided papers into small blocks based on discriminative author attributes and blocks of the same author will be merged according to pairwise classification results of MA-PairRNN. MA-PairRNN combines heterogeneous graph embedding learning and pairwise similarity learning into a framework. In addition to attribute and structure information, MA-PairRNN also exploits semantic information by meta-path and generates node representation in an inductive way, which is scalable to large graphs. Furthermore, a semantic-level attention mechanism is adopted to fuse multiple meta-path based representations. A Pseudo-Siamese network consisting of two RNNs takes two paper sequences in publication time order as input and outputs their similarity. Results on two real-world datasets demonstrate that our framework has a significant and consistent improvement of performance on the name disambiguation task. It was also demonstrated that MA-PairRNN can perform well with a small amount of training data and have better generalization ability across different research areas.

[1]  Jianyong Wang,et al.  On Graph-Based Name Disambiguation , 2011, JDIQ.

[2]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[3]  C. Lee Giles,et al.  Hybrid Deep Pairwise Classification for Author Name Disambiguation , 2019, CIKM.

[4]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[5]  Philip S. Yu,et al.  Multi-information Source HIN for Medical Concept Embedding , 2020, PAKDD.

[6]  Philip S. Yu,et al.  Fine-grained Event Categorization with Heterogeneous Graph Convolutional Networks , 2019, IJCAI.

[7]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[8]  Byung-Won On,et al.  Effective and scalable solutions for mixed and split citation problems in digital libraries , 2005, IQIS '05.

[9]  Jianxin Li,et al.  Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN , 2018, WWW.

[10]  Jie Tang,et al.  Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. , 2018, KDD.

[11]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[12]  Rong Pan,et al.  Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding , 2018, AAAI.

[13]  Philip S. Yu,et al.  Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification , 2019, IEEE Transactions on Knowledge and Data Engineering.

[14]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[15]  Chen Gao,et al.  AdversarialNAS: Adversarial Neural Architecture Search for GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[17]  Yu He,et al.  HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding , 2019, CIKM.

[18]  Zhoujun Li,et al.  Diabetes-Associated Factors as Predictors of Nursing Home Admission and Costs in the Elderly Across Europe. , 2017, Journal of the American Medical Directors Association.

[19]  Gilles Louppe,et al.  Ethnicity Sensitive Author Disambiguation Using Semi-supervised Learning , 2015, KESW.

[20]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Jaewoo Kang,et al.  Graph Transformer Networks , 2019, NeurIPS.

[22]  Mohammad Al Hasan,et al.  Name Disambiguation in Anonymized Graphs using Network Embedding , 2017, CIKM.

[23]  Philip S. Yu,et al.  Improving stock market prediction via heterogeneous information fusion , 2017, Knowl. Based Syst..

[24]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[25]  Luo Si,et al.  Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion , 2013, SIGIR.

[26]  Ryan A. Rossi,et al.  Deep Feature Learning for Graphs , 2017, ArXiv.