Semantic Author Name Disambiguation with Word Embeddings

We present a supervised machine learning AND system which tackles semantic similarity between publication titles by means of word embeddings. Word embeddings are integrated as external components, which keeps the model small and efficient, while allowing for easy extensibility and domain adaptation. Initial experiments show that word embeddings can improve the Recall and F score of the binary classification sub-task of AND. Results for the clustering sub-task are less clear, but also promising and overall show the feasibility of the approach.

[1]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[2]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[3]  Christoph Müller,et al.  Data sets for author name disambiguation: an empirical analysis and a new resource , 2017, Scientometrics.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[6]  Taehwan Kim,et al.  Author name disambiguation using a graph model with node splitting and merging based on bibliographic information , 2014, Scientometrics.

[7]  Marcos André Gonçalves,et al.  On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method , 2015, International Journal on Digital Libraries.

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Neil R. Smalheiser,et al.  Author name disambiguation , 2009, Annu. Rev. Inf. Sci. Technol..

[10]  Seungwoo Lee,et al.  Construction of a large-scale test set for author disambiguation , 2011, Inf. Process. Manag..

[11]  Qinghua Zheng,et al.  Dynamic author name disambiguation for growing digital libraries , 2015, Information Retrieval Journal.

[12]  Benoît Favre,et al.  Word Embedding Evaluation and Combination , 2016, LREC.

[13]  Marcos André Gonçalves,et al.  A brief survey of automatic methods for author name disambiguation , 2012, SGMD.

[14]  Marcos André Gonçalves,et al.  An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations , 2010, J. Assoc. Inf. Sci. Technol..

[15]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[16]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[17]  Peter van den Besselaar,et al.  Author disambiguation using multi-aspect similarity indicators , 2011, Scientometrics.

[18]  Tien Do,et al.  Author Name Disambiguation by Using Deep Neural Network , 2014, ACIIDS.