An Analysis of the SUDOC Bibliographic Knowledge Base from a Link Validity Viewpoint

In the aim of evaluating and improving link quality in bibliographical knowledge bases, we develop a decision support system based on partitioning semantics. The novelty of our approach consists in using symbolic values criteria for partitioning and suitable partitioning semantics. In this paper we evaluate and compare the above mentioned semantics on a real qualitative sample. This sample is issued from the catalogue of French university libraries (SUDOC), a bibliographical knowledge base maintained by the University Bibliographic Agency (ABES).

[1]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[2]  Paolo Bouquet,et al.  An Entity Name System (ENS) for the Semantic Web , 2008, ESWC.

[3]  Bernard Monjardet MATHÉMATIQUES ET SCIENCES HUMAINES , 1977 .

[4]  Alain Guénoche Partitions optimisées selon différents critères : Evaluation et comparaison , 2003 .

[5]  Lise Getoor,et al.  Entity Resolution in Graphs , 2005 .

[6]  S. Y. Wang,et al.  Existence of a pareto equilibrium , 1993 .

[7]  Madalina Croitoru,et al.  On Link Validity in Bibliographic Knowledge Bases , 2012, IPMU.

[8]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.

[9]  Rahul Gupta,et al.  Answering Table Augmentation Queries from Unstructured Lists on the Web , 2009, Proc. VLDB Endow..

[10]  Surajit Chaudhuri,et al.  Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.

[11]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12]  Madalina Croitoru,et al.  Aggregation Semantics for Link Validity , 2013, SGAI Conf..

[13]  Rajeev Motwani,et al.  Robust identification of fuzzy duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[15]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[16]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[17]  Ken Urai Mathematics and Social Science , 2010 .

[18]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[19]  Lamberto Cesari,et al.  Optimization-Theory And Applications , 1983 .

[20]  Sugato Basu,et al.  Adaptive product normalization: using online learning for record linkage in comparison shopping , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).