Hybrid Deep Pairwise Classification for Author Name Disambiguation

Author name disambiguation (AND) can be defined as the problem of clustering together unique authors from all author mentions that have been extracted from publication or related records in digital libraries or other sources. Pairwise classification is an essential part of AND, and is used to estimate the probability that any pair of author mentions belong to the same author. Previous studies trained classifiers with features manually extracted from each attribute of the data. Recently, others trained a model to learn a vector representation from text without considering any structure information. Both of these approaches have advantages. The former method takes advantage of the structure of data, while the latter takes into account the textual similarity across attributes. Here, we introduce a hybrid method which takes advantage of both approaches by extracting both structure-aware features and global features. In addition, we introduce a novel way to train a global model utilizing a large number of negative samples. Results on AMiner and PubMed data shows the relative improvement of the mean average precision (MAP) by more than 7.45% when compared to previous state-of-the-art methods.

[1]  Mohammad Al Hasan,et al.  Name Disambiguation in Anonymized Graphs using Network Embedding , 2017, CIKM.

[2]  C. Lee Giles,et al.  A Web Service for Author Name Disambiguation in Scholarly Databases , 2018, 2018 IEEE International Conference on Web Services (ICWS).

[3]  C. Lee Giles,et al.  Efficient Name Disambiguation for Large-Scale Databases , 2006, PKDD.

[4]  Christoph Müller Semantic Author Name Disambiguation with Word Embeddings , 2017, TPDL.

[5]  Gilles Louppe,et al.  Ethnicity Sensitive Author Disambiguation Using Semi-supervised Learning , 2015, KESW.

[6]  Cheng Li,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[7]  Daniel Jurafsky,et al.  Citation-based bootstrapping for large-scale author disambiguation , 2012, J. Assoc. Inf. Sci. Technol..

[8]  Jianyong Wang,et al.  On Graph-Based Name Disambiguation , 2011, JDIQ.

[9]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[10]  Jie Tang,et al.  Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. , 2018, KDD.

[11]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[12]  Madian Khabsa,et al.  Online Person Name Disambiguation with Constraints , 2015, JCDL.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[15]  Tien Do,et al.  Author Name Disambiguation by Using Deep Neural Network , 2014, ACIIDS.

[16]  Shafiq R. Joty,et al.  Distributed Representations of Tuples for Entity Resolution , 2018, Proc. VLDB Endow..

[17]  Satoshi Oyama,et al.  A Deep Neural Network for Pairwise Classification: Enabling Feature Conjunctions and Ensuring Symmetry , 2017, PAKDD.