Diting : An Author Disambiguation method based on Network Representation Learning

It is important to disambiguate names among persons in many scenarios. In this work, we propose an unsupervised method Diting and a semi-supervised method Diting++ for author disambiguation. In Diting, we learn a low-dimensional vector to represent each paper in networks, which are formed by connecting papers with multiple types of relationship (such as co-author). During representation learning, we focus on maximizing the gap between positive edges and negative edges. Further, we propose a clustering algorithm which associates papers to their real-life authors. To make full use of the authorship information, which is easy to obtain from the authors’ homepages, we design Diting++ to improve the performance for name disambiguation. Diting++ uses the authorship information listed on the authors’ homepages to construct label networks and uses a network representation learning method to learn paper representations based on label networks and other networks. Further, Diting++ uses a semi-supervised clustering method to partition learned paper representations into disjoint groups. Each group belongs to a distinct author. By making use of the label information, the clustering method partitions papers written by the same author in the same group, whereas papers written by different authors locate in different groups. Through extensive experiments, we show that our methods are significantly better than the state-of-the-art author disambiguation methods. INDEX TERMS network representation learning, network embedding, author disambiguation

[1]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[2]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[3]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[4]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[6]  Madian Khabsa,et al.  Online Person Name Disambiguation with Constraints , 2015, JCDL.

[7]  Zhiyuan Liu,et al.  CANE: Context-Aware Network Embedding for Relation Modeling , 2017, ACL.

[8]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Juan-Zi Li,et al.  A Unified Probabilistic Framework for Name Disambiguation in Digital Library , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Zhiyuan Liu,et al.  Fast Network Embedding Enhancement via High Order Proximity Approximation , 2017, IJCAI.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[13]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[14]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[15]  Yang Yang,et al.  Representation Learning for Scale-free Networks , 2017, AAAI.

[16]  Dan Wang,et al.  Adversarial Network Embedding , 2017, AAAI.

[17]  Jian Zou,et al.  Incorporating Deep Features in the Analysis of Tissue Microarray Images , 2018, Statistics and its interface.

[18]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[19]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[20]  Murat Dundar,et al.  Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams , 2016, CIKM.

[21]  Yong Tang,et al.  A Novel Approach for Author Name Disambiguation Using Ranking Confidence , 2017, DASFAA Workshops.

[22]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[23]  Hiroyuki Shindo,et al.  Learning Distributed Representations of Texts and Entities from Knowledge Base , 2017, TACL.

[24]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[25]  Jun Xu,et al.  A Network-embedding Based Method for Author Disambiguation , 2018, CIKM.

[26]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Fei Wang,et al.  Structural Deep Embedding for Hyper-Networks , 2017, AAAI.

[28]  Weiyi Meng,et al.  Efficient SPectrAl Neighborhood blocking for entity resolution , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[29]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[30]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[31]  Muhammad Imran,et al.  A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Digital Libraries , 2013, D Lib Mag..

[32]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[33]  Wang-Chien Lee,et al.  HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning , 2017, CIKM.

[34]  Shou-De Lin,et al.  Effective string processing and matching for author disambiguation , 2013, KDD Cup '13.

[35]  Mohammad Al Hasan,et al.  Name Disambiguation in Anonymized Graphs using Network Embedding , 2017, CIKM.

[36]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[37]  Philip S. Yu,et al.  ADANA: Active Name Disambiguation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[38]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[39]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[40]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[41]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[42]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[43]  Qinghua Zheng,et al.  Dynamic author name disambiguation for growing digital libraries , 2015, Information Retrieval Journal.

[44]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[45]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[46]  Steven Skiena,et al.  HARP: Hierarchical Representation Learning for Networks , 2017, AAAI.

[47]  Minyi Guo,et al.  GraphGAN: Graph Representation Learning with Generative Adversarial Nets , 2017, AAAI.

[48]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.