Towards a Compositional Semantic Account of Data Quality Attributes

We address the fundamental question: what does it mean for data in a database to be of high quality? We motivate our discussion with examples, where traditional views on data quality are found to be unsatisfactory. Our work is founded on the premise that data values are primarily linguistic signs that convey meaning from their producer to their user through senses and referents. In this setting, data quality issues arise when discrepancies occur during this communication. We sketch a theory of senses for individual values in a relational table based on its semantics expressed using some ontology. We use this to offer a compositional approach, where data quality is expressed in terms of a variety of primitive relationships among values and their senses. We evaluate our approach by accounting for quality attributes in other frameworks proposed in the literature. This exercise allows us to (i) reveal and differentiate multiple, sometimes conflicting, definitions of a quality attribute, (ii) accommodate competing views on how these attributes are related, and (iii) point to possible new definitions.

[1]  Tok Wang Ling,et al.  Conceptual Modeling – ER ’98 , 1998, Lecture Notes in Computer Science.

[2]  Kecheng Liu,et al.  Semiotics in Information Systems Engineering , 2000 .

[3]  Stefano Spaccapietra Journal on Data Semantics VII , 2006, Journal on Data Semantics VII.

[4]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[5]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[6]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[7]  Edward L. Keenan,et al.  The Intensional Logic , 1985 .

[8]  Liping Liu,et al.  Evolutional Data Quality: A Theory-Specific View , 2002, ICIQ.

[9]  Alun D. Preece,et al.  Managing Information Quality in e-Science: A Case Study in Proteomics , 2005, ER.

[10]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[11]  Richard K. Lynch The Impact of Packaged Software on User/Vendor Life Cycle Concepts , 1987 .

[12]  Felix Naumann,et al.  Do Metadata Models meet IQ Requirements? , 1999, IQ.

[13]  Diego Calvanese,et al.  Data Integration in Data Warehousing (Keynote Address) , 2001, CAiSE Workshops.

[14]  R. P. Srivastava,et al.  A conceptual framework and belief‐function approach to assessing overall information quality , 2003, Int. J. Intell. Syst..

[15]  Shazia Wasim Sadiq,et al.  Data Quality in Web Information Systems , 2008, WISE.

[16]  Richard Y. Wang,et al.  Toward quality data: An attribute-based approach , 2014, Decis. Support Syst..

[17]  Graeme Shanks,et al.  A Semiotic Information Quality Framework , 2004 .

[18]  Nicola Guarino,et al.  Wonderweb Deliverable D17 , 2002 .

[19]  G. Price,et al.  On the communication of measurement results , 2001 .

[20]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[21]  John Mylopoulos,et al.  Goal-Oriented Conceptual Database Design , 2007, 15th IEEE International Requirements Engineering Conference (RE 2007).

[22]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[23]  John Mylopoulos,et al.  Discovering the Semantics of Relational Tables Through Mappings , 2006, J. Data Semant..

[24]  Matthias Jarke,et al.  Design and Analysis of Quality Information for Data Warehouses , 1998, ER.

[25]  John Mylopoulos,et al.  Data Quality By Design: A Goal-Oriented Approach , 2007, ICIQ.

[26]  Niv Ahituv,et al.  Assessing Data Reliability in an Information System , 1987, J. Manag. Inf. Syst..

[27]  G. L. Collected Papers , 1912, Nature.

[28]  Zbigniew J. Gackowski,et al.  LOGICAL INTERDEPENDENCE OF DATA / INFORMATION QUALITY DIMENSIONS-A PURPOSE-FOCUSED VIEW ON IQ ( Research-in-progress – IQ Concepts , Models ) , 2004 .