How good is a company’s data quality? Answering this question requires usable data quality metrics. Currently, most data quality measures are developed on an ad hoc basis to solve specific problems [6, 8], and fundamental principles necessary for developing usable metrics in practice are lacking. In this article, we describe principles that can help organizations develop usable data quality metrics. Studies have confirmed data quality is a multi-dimensional concept [1, 2, 6, 9, 10, 12]. Companies must deal with both the subjective perceptions of the individuals involved with the data, and the objective measurements based on the data set in question. Subjective data quality assessments reflect the needs and experiences of stakeholders: the collectors, custodians, and consumers of data products [2, 11]. If stakeholders assess the quality of data as poor, their behavior will be influenced by this assessment. One can use a questionnaire to measure stakeholder perceptions of data quality dimensions. Many healthcare, finance, and consumer product companies have used one such questionnaire, developed to assess data quality dimensions listed in Table 1 [7]. A major U.S. bank that administered the questionnaire found custodians (mostly MIS professionals) view their data as highly timely, but consumers disagree; and data consumers view data as difficult to manipulate for their business purposes, but custodians disagree [4, 6]. A follow-up investigation into the root causes of differing assessments provided valuable insight on areas needing improvement. Objective assessments can be task-independent or task-dependent. Task-independent metrics reflect states of the data without the contextual knowledge of the application, and can be applied to any data set, regardless of the tasks at hand. Taskdependent metrics, which include the organization’s business rules, company and government regulations, and constraints provided by the database administrator, are developed in specific application contexts.
[1]
Diane M. Strong,et al.
Information quality benchmarks: product and service performance
,
2002,
CACM.
[2]
Diane M. Strong,et al.
Beyond Accuracy: What Data Quality Means to Data Consumers
,
1996,
J. Manag. Inf. Syst..
[3]
Richard Y. Wang,et al.
Quality information and knowledge
,
1998
.
[4]
Thomas Redman,et al.
Data quality for the information age
,
1996
.
[5]
Richard Y. Wang,et al.
Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien
,
1998
.
[6]
Donald P. Ballou,et al.
Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems
,
1985
.
[7]
Kenneth C. Laudon,et al.
Data quality and due process in large interorganizational record systems
,
1986,
CACM.
[8]
Richard Y. Wang,et al.
A product perspective on total data quality management
,
1998,
CACM.
[9]
Richard Y. Wang,et al.
Anchoring data quality dimensions in ontological foundations
,
1996,
CACM.
[10]
E. F. Codd,et al.
Relational database: a practical foundation for productivity
,
1982,
CACM.