Using Conditional Functional Dependency to Discover Abnormal Data in RDF Graphs

Many issues about data quality have been studied in relational data, such as data consistency, data deduplication, data accuracy, data completeness and so on. In this paper, we focus on the discovery of abnormal data in RDF graphs. As the amount of RDF data is increasing, data quality is becoming an important issue for usability of these RDF repositories. Although association rules have been used to find abnormals in RDF graph, existing solutions ignore the latent semantics of connected structures in RDF graphs. In order to detect latent dependencies in RDF graph, firstly, we innovatively define Graph-based Conditional Functional Dependency(GCFD) that can represent the attribute value and semantic structure dependencies of RDF data in a uniform manner. Then, we propose an efficient framework and some novel pruning rules to discover GCFD in large RDF graphs. Extensive experiments on several real-life RDF repositories confirm the superiority of our solution.