Making quality count in biological data sources

We propose an extension to the semistructured data model that captures and integrates information about the quality of the stored data. Specifically, we describe the main challenges involved in measuring and representing data quality, and how we addressed them. These challenges include extending an existing data model to include quality metadata, identifying useful quality measures, and devising a way to compute and update the value of the quality measures as data is queried and updated. Although our approach can be generalized to various other domains, it is currently aimed at describing the quality of biological data sources. We illustrate the benefits of our model using several examples from biological databases.

[1]  Antonino Virgillito Carlo Marchetti,et al.  The DaQuinCIS Architecture : a Platform for Exchanging and Improving Data Quality in Cooperative Information Systems ? , 2003 .

[2]  Felix Naumann,et al.  Data Quality in Genome Databases , 2003, ICIQ.

[3]  Felix Naumann,et al.  Completeness of integrated information sources , 2004, Inf. Syst..

[4]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[5]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[6]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[7]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[8]  Roberto Baldoni,et al.  The architecture: a platform for exchanging and improving data quality in cooperative information systems , 2004, Inf. Syst..

[9]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[10]  Diane M. Strong,et al.  Knowing-Why About Data Processes and Data Quality , 2004 .

[11]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[12]  Tiziana Catarci,et al.  Trusting Data Quality in Cooperative Information Systems , 2002, OTM.

[13]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[14]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[15]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[16]  Carlo Batini,et al.  A Multidimensional Model for Information Quality in Cooperative Information Systems , 2003, ICIQ.

[17]  Maria-Esther Vidal,et al.  Querying Quality of Data Metadata , 1998 .

[18]  Richard Y. Wang,et al.  Toward quality data: An attribute-based approach , 2014, Decis. Support Syst..

[19]  Diego Calvanese,et al.  Modeling and Querying Semi-Structured data , 1999, Netw. Inf. Syst. J..

[20]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[21]  Stuart E. Madnick,et al.  Special Section: Assuring Information Quality , 2004, J. Manag. Inf. Syst..

[22]  Joachim Hammer,et al.  Element matching across data-oriented XML sources using a multi-strategy clustering model , 2004, Data Knowl. Eng..

[23]  Ken Orr,et al.  Data quality and systems theory , 1998, CACM.