An event based framework for improving information quality that integrates baseline models, causal models and formal reference models

We introduce a framework for improving information quality in complex distributed systems that integrates: 1) Analytic models that describe baseline values for attributes and combinations of attributes and components that detect statistically significant changes from baselines. These models determine whether a significant change has occurred, and if so, when. 2) Casual models that help determine why a statistically significant change has occurred and what its impact is. These models focus on the reasons for a change. 3) Formal business and technical reference models so that data and information quality problems are less likely to occur in the future. In this note, we focus on the first two types of models and describe how this framework applies to data quality problems associated with electronic payments transactions and highway traffic patterns.

[1]  A. Agresti An introduction to categorical data analysis , 1997 .

[2]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[3]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[4]  Robert L. Grossman Alert Management Systems: A Quick Introduction , 2005 .

[5]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[6]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[7]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[8]  S. Turner DEFINING AND MEASURING TRAFFIC DATA QUALITY , 2002 .

[9]  Phillip Cykana,et al.  DoD Guidelines on Data Quality Management , 1996, IQ.

[10]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[11]  Robert L. Grossman,et al.  Data mining standards initiatives , 2002, CACM.

[12]  Axel Uhl,et al.  Model-Driven Architecture , 2002, OOIS Workshops.

[13]  Robert L. Grossman,et al.  An algebraic approach to data mining: some examples , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Thomas C. Redman,et al.  Data Quality: The Field Guide , 2001 .

[15]  Robert L. Grossman,et al.  Data Mining and Tree-Based Optimization , 1996, KDD.

[16]  Ken Orr,et al.  Data quality and systems theory , 1998, CACM.

[17]  James J. Rooney,et al.  Root cause analysis for beginners , 2004 .

[18]  W. Edwards Deming,et al.  Elementary principles of the statistical control of quality : a series of lectures , 1952 .