Consolidation of References to Persons in Bibliographic Databases

Entity resolution is the process of determining if, in a specific context, two or more references correspond to the same entity. In this work, we address this problem in the context of references to persons as they are found in bibliographic data, specifically in the case of consolidating multiple datasets. Or solution follows the extraction, transformation and loading (ETL) process, typical in data warehouses. It computes the similarities of the attribute values for the references, and employs a decision tree to decide when the references match. We describe the characteristics of these references within bibliographic datasets, and how we explored those characteristics by developing new similarity metrics to improve the quality of the consolidation process. We evaluated our work by designing an experiment with data from four national libraries. The results show that the proposed similarity metrics contribute significantly to the consolidation process.