A comprehensive data quality methodology for web and structured data

Measuring and improving data quality in an organisation or in a group of interacting organisations is a complex task. Several methodologies have been developed in the past, providing a basis for the definition of a data quality programme that guarantees high data quality levels. Since the main limitation of existing approaches is their specialisation on specific issues or contexts, this paper presents a Comprehensive Data Quality (CDQ) methodology. The main aim of the CDQ methodology is the integration and enhancement of the phases, techniques and tools proposed by previous approaches. In particular, the CDQ methodology is conceived to be at the same time complete, flexible and simple to apply. Completeness is achieved by considering an existing techniques and tools and integrating them in a framework that can work in any organisation. The methodology is flexible, since it supports the user in the selection of the most suitable techniques and tools within each phase and in any context. Finally, CDQ is simple, since it is organised in phases and each phase is characterised by a specific goal and a set of techniques to apply. The methodology is explained by means of a running example and significant cases of its application are reported.

[1]  M. Hammer,et al.  REENGINEERING THE CORPORATION: A MANIFESTO FOR BUSINESS REVOLUTION , 1995 .

[2]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[3]  William E. Winkler,et al.  Methods for evaluating and creating data quality , 2004, Inf. Syst..

[4]  Ying Su,et al.  A Methodology For Information Quality Assessment In The Designing And Manufacturing Processes Of Mechanical Products , 2004, ICIQ.

[5]  Total Quality data Management (TQdM) - Methodology for Information Quality Improvement , 2002, Information and Database Quality.

[6]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[7]  Giri Kumar Tayi,et al.  Enhancing data quality in data warehouse environments , 1999, CACM.

[8]  Roberto Baldoni,et al.  The architecture: a platform for exchanging and improving data quality in cooperative information systems , 2004, Inf. Syst..

[9]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[10]  R. Rao From unstructured data to actionable intelligence , 2003 .

[11]  Danilo Ardagna,et al.  A Broker for Selecting and Provisioning High Quality Syndicated Data , 2005, ICIQ.

[12]  Matthias Jarke,et al.  Fundamentals of Data Warehouses , 2000, Springer Berlin Heidelberg.

[13]  Felix Naumann,et al.  Assessment Methods for Information Quality Criteria , 2000, IQ.

[14]  Carlo Batini,et al.  Improving government-to-business relationships through data reconciliation and process re-engineering , 2005 .

[15]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[16]  Felix Naumann,et al.  Object Identification Quality , 2003 .

[17]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[18]  Felix Naumann,et al.  Completeness of integrated information sources , 2004, Inf. Syst..

[19]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[20]  K. Ishikawa What is total quality control the japanese way , 2002 .

[21]  Joel E. Ross,et al.  Total Quality Management: Text, Cases and Readings , 1992 .

[22]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[23]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[24]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[25]  Felix Naumann,et al.  Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.

[26]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[27]  Martin J. Eppler,et al.  Measuring Information Quality in the Web Context: A Survey of State-of-the-Art Instruments and an Application Methodology , 2002, ICIQ.

[28]  Elizabeth M. Pierce Assessing data quality with control matrices , 2004, CACM.

[29]  David Loshin Enterprise knowledge management: the data quality approach , 2000 .

[30]  Diego Calvanese,et al.  Modeling and Querying Semi-Structured data , 1999, Netw. Inf. Syst. J..

[31]  Larry P. English Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits , 1999 .

[32]  Carlo Batini,et al.  A formulation of the Data Quality Optimization Problem in Cooperative Information Systems , 2004, CAiSE Workshops.