论文信息 - Investigating the quality of a bibliographic knowledge base using partitioning semantics

Investigating the quality of a bibliographic knowledge base using partitioning semantics

With the aim of evaluating and improving link quality in bibliographical knowledge bases, we develop a decision support system based on partitioning semantics. Two such semantics have been proposed, the novelty of this approach consisting on using symbolic values criteria for partitioning. In this paper we investigate the limits of those partitioning semantics: how the characteristics of the input (objects and criteria) influences characteristics of the result, namely correctness of the result and execution time.

Madalina Croitoru | Léa Guizol | Madalina Croitoru | Léa Guizol

[1] Rahul Gupta,et al. Answering Table Augmentation Queries from Unstructured Lists on the Web , 2009, Proc. VLDB Endow..

[2] Wenfei Fan,et al. Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.

[4] Christopher Ré,et al. Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6] Min-Yen Kan,et al. Record matching in digital library metadata , 2008, CACM.

[7] Madalina Croitoru,et al. Aggregation Semantics for Link Validity , 2013, SGAI Conf..

[8] Sugato Basu,et al. Adaptive product normalization: using online learning for record linkage in comparison shopping , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9] Dennis Shasha,et al. Efficient data reconciliation , 2001, Inf. Sci..

[10] Lise Getoor,et al. Entity Resolution in Graphs , 2005 .

[11] S. Y. Wang,et al. Existence of a pareto equilibrium , 1993 .

[12] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[13] Paolo Bouquet,et al. An Entity Name System (ENS) for the Semantic Web , 2008, ESWC.

[14] Wenfei Fan,et al. Dependencies revisited for improving data quality , 2008, PODS.

[15] Madalina Croitoru,et al. On Link Validity in Bibliographic Knowledge Bases , 2012, IPMU.

[16] Rajeev Motwani,et al. Robust identification of fuzzy duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17] C. Benito. Annual Review of Information Science and Technology (ARIST) , 2003 .

[18] Surajit Chaudhuri,et al. Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.

[19] Alain Guénoche. Partitions optimisées selon différents critères : Evaluation et comparaison , 2003 .

[20] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[21] W. Winkler. Overview of Record Linkage and Current Research Directions , 2006 .

[22] Anthony Wirth,et al. Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.