A relevance feedback approach for the author name disambiguation problem

This paper presents a new name disambiguation method that exploits user feedback on ambiguous references across iterations. An unsupervised step is used to define pure training samples, and a hybrid supervised step is employed to learn a classification model for assigning references to authors. Our classification scheme combines the Optimum-Path Forest (OPF) classifier with complex reference similarity functions generated by a Genetic Programming framework. Experiments demonstrate that the proposed method yields better results than state-of-the-art disambiguation methods on two traditional datasets.

[1]  Jianyong Wang,et al.  On Graph-Based Name Disambiguation , 2011, JDIQ.

[2]  Marcos André Gonçalves,et al.  Incremental Unsupervised Name Disambiguation in Cleaned Digital Libraries , 2011, J. Inf. Data Manag..

[3]  Daniel Jurafsky,et al.  Citation-based bootstrapping for large-scale author disambiguation , 2012, J. Assoc. Inf. Sci. Technol..

[4]  Adriano Veloso,et al.  Active associative sampling for author name disambiguation , 2012, JCDL '12.

[5]  Jurandy Almeida,et al.  Fusion of Local and Global Descriptors for Content-Based Image and Video Retrieval , 2012, CIARP.

[6]  Wei Xu,et al.  A hierarchical naive Bayes mixture model for name disambiguation in author citations , 2005, SAC '05.

[7]  Ruixuan Li,et al.  Incorporating User Feedback into Name Disambiguation of Scientific Cooperation Network , 2011, WAIM.

[8]  Weiguo Fan,et al.  Relevance feedback based on genetic programming for image retrieval , 2011, Pattern Recognit. Lett..

[9]  Ricardo da Silva Torres,et al.  Learning to rank for content-based image retrieval , 2010, MIR '10.

[10]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[11]  Alexandre X. Falcão,et al.  A new CBIR approach based on relevance feedback and optimum-path forest classification , 2010, J. WSCG.

[12]  Adriano Veloso,et al.  Effective self-training author name disambiguation in scholarly digital libraries , 2010, JCDL '10.

[13]  Carlos Alberto Heuser,et al.  Evaluating the Use of Social Networks in Author Name Disambiguation in Digital Libraries , 2010, SBBD.

[14]  Weiguo Fan,et al.  Genetic-based approaches in ranking function discovery and optimization in information retrieval - A framework , 2009, Decis. Support Syst..

[15]  Byung-Won On,et al.  Scalable Name Disambiguation using Multi-level Graph Partition , 2007, SDM.

[16]  Andrew McCallum,et al.  Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function , 2007 .

[17]  José M. Soler Separating the articles of authors with the same name , 2007, Scientometrics.

[18]  Philip S. Yu,et al.  ADANA: Active Name Disambiguation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[19]  Berthier A. Ribeiro-Neto,et al.  Using web information for author name disambiguation , 2009, JCDL '09.

[20]  Pável Calado,et al.  A combined component approach for finding collection-adapted ranking functions based on genetic programming , 2007, SIGIR.

[21]  João Paulo Papa,et al.  Supervised pattern classification based on optimum‐path forest , 2009, Int. J. Imaging Syst. Technol..

[22]  Jefersson Alex dos Santos,et al.  Interactive Classification of Remote Sensing Images by Using Optimum-Path Forest and Genetic Programming , 2011, CAIP.

[23]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[24]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..

[25]  Jefersson Alex dos Santos,et al.  Incorporating multiple distance spaces in optimum-path forest classification to improve feedback-based learning , 2012, Comput. Vis. Image Underst..

[26]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[27]  Seungwoo Lee,et al.  Construction of a large-scale test set for author disambiguation , 2011, Inf. Process. Manag..

[28]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[29]  Itshak Lapidot Self-Organizing-Maps With BIC For Speaker Clustering , 2002 .

[30]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[31]  Neil R. Smalheiser,et al.  Author name disambiguation in MEDLINE , 2009, TKDD.

[32]  Wagner Meira,et al.  Cost-effective on-demand associative author name disambiguation , 2012, Inf. Process. Manag..

[33]  Marcos André Gonçalves,et al.  Improving Author Name Disambiguation with User Relevance Feedback , 2012, J. Inf. Data Manag..

[34]  Marcos André Gonçalves,et al.  A brief survey of automatic methods for author name disambiguation , 2012, SGMD.

[35]  Marcos André Gonçalves,et al.  An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations , 2010, J. Assoc. Inf. Sci. Technol..

[36]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[37]  Byung-Won On,et al.  Comparative study of name disambiguation problem using a scalable blocking-based framework , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[38]  C. Lee Giles,et al.  Efficient Name Disambiguation for Large-Scale Databases , 2006, PKDD.

[39]  Peter Nordin,et al.  Using Factorial Experiments to Evaluate the Effect of Genetic Programming Parameters , 2000, EuroGP.

[40]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[41]  Edward A. Fox,et al.  A genetic programming framework for content-based image retrieval , 2009, Pattern Recognit..

[42]  Jian Pei,et al.  Improving Grouped-Entity Resolution Using Quasi-Cliques , 2006, Sixth International Conference on Data Mining (ICDM'06).

[43]  João Paulo Papa,et al.  Efficient supervised optimum-path forest classification for large datasets , 2012, Pattern Recognit..

[44]  Ricardo da Silva Torres,et al.  Multimodal retrieval with relevance feedback based on genetic programming , 2012, Multimedia Tools and Applications.

[45]  Yang Song,et al.  Efficient topic-based unsupervised name disambiguation , 2007, JCDL '07.