论文信息 - Can You Find All the Data You Expect in a Linked Dataset?

Can You Find All the Data You Expect in a Linked Dataset?

The huge volume of datasets available on the Web has motivated the development of a new class of Web applications, which allow users to perform complex queries on top of a set of predefined linked datasets. However, given the large number of available datasets and the lack of information about their quality, the selection of datasets for a particular application may become a very complex and time consuming task. In this work, we argue that one possible way of helping the selection of datasets for a given application consists of evaluating the completeness of the dataset with respect to the data considered as important by the application users. With this in mind, we propose an approach to assess the completeness of a linked dataset, which considers a set of specific data requirements and allows saving large amounts of query processing. To provide a more detailed evaluation, we propose three distinct types of completeness: schema, literal and instance completeness. We present the definitions underlying our approach and some results obtained with the accomplished evaluation.

Damires Souza | Bernadette Farias Lóscio | Walter Travassos Sarinho

[1] Martin Hepp,et al. Swiqa - a semantic web information quality assessment framework , 2011, ECIS.

[2] A. Maurino,et al. Quality Assessment Methodologies for Linked Open Data , 2012 .

[3] Katja Hose,et al. FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[4] Graeme G. Shanks,et al. A semiotic information quality framework: development and comparative analysis , 2005, J. Inf. Technol..

[5] Sebastian Speiser,et al. On Completeness Classes for Query Evaluation on Linked Data , 2012, AAAI.

[6] Felix Naumann. Data Fusion and Data Quality , 1998 .

[7] Richard Y. Wang,et al. Data quality assessment , 2002, CACM.

[8] Diane M. Strong,et al. Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[9] Felix Naumann,et al. System P: Completeness-driven Query Answering in Peer Data Management Systems , 2007, BTW.

[10] Werner Nutt,et al. Completeness Statements about RDF Data Sources and Their Use for Query Answering , 2013, SEMWEB.

[11] Christian Bizer,et al. Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[12] Tim Berners-Lee,et al. Linked data , 2020, Semantic Web for the Working Ontologist.