Investigating the quality of a bibliographic knowledge base using partitioning semantics

With the aim of evaluating and improving link quality in bibliographical knowledge bases, we develop a decision support system based on partitioning semantics. Two such semantics have been proposed, the novelty of this approach consisting on using symbolic values criteria for partitioning. In this paper we investigate the limits of those partitioning semantics: how the characteristics of the input (objects and criteria) influences characteristics of the result, namely correctness of the result and execution time.

[1]  Rahul Gupta,et al.  Answering Table Augmentation Queries from Unstructured Lists on the Web , 2009, Proc. VLDB Endow..

[2]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[4]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Min-Yen Kan,et al.  Record matching in digital library metadata , 2008, CACM.

[7]  Madalina Croitoru,et al.  Aggregation Semantics for Link Validity , 2013, SGAI Conf..

[8]  Sugato Basu,et al.  Adaptive product normalization: using online learning for record linkage in comparison shopping , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9]  Dennis Shasha,et al.  Efficient data reconciliation , 2001, Inf. Sci..

[10]  Lise Getoor,et al.  Entity Resolution in Graphs , 2005 .

[11]  S. Y. Wang,et al.  Existence of a pareto equilibrium , 1993 .

[12]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[13]  Paolo Bouquet,et al.  An Entity Name System (ENS) for the Semantic Web , 2008, ESWC.

[14]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.

[15]  Madalina Croitoru,et al.  On Link Validity in Bibliographic Knowledge Bases , 2012, IPMU.

[16]  Rajeev Motwani,et al.  Robust identification of fuzzy duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  C. Benito Annual Review of Information Science and Technology (ARIST) , 2003 .

[18]  Surajit Chaudhuri,et al.  Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.

[19]  Alain Guénoche Partitions optimisées selon différents critères : Evaluation et comparaison , 2003 .

[20]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[21]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[22]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.