Metadatenverwaltung zur qualitätsorientierten Informationslogistik in Data-Warehouse-Systemen

The goal of a data warehouse system is to provide a comprehensive overview of the data available in a company, thereby supporting the management decisions. The integration of data coming from heterogeneous sources is one of the key problems in data warehousing. The technical foundations for the integration have been developed in recent years. However, an efficient technical infrastructure is not sufficient to address the following problems. Firstly, the data in the systems involved have different semantics. Secondly, there are different user requirements regarding the quality of data. Existing systems are unable to solve these problems. The present thesis supports the development of data warehouse systems paying special attention to the problems regarding semantics and data quality. The approach is based on the explicit modelling of meta data of data warehouse systems. In particular, the conceptual context, the quality requirements, and the quality characteristics of the individual system components are represented in a formal model. The main contributions of the present thesis are, firstly, an extended meta model of the architecture and processes of a data warehouse system and, secondly, a quality model for the systematic representation of quality requirements and measurements. Furthermore, a classification of quality dimensions and factors is developed that can be used for an extensive quality management in data warehouse systems. The meta data is applied in a model for quality management as well as in a methodology for quality-oriented data integration. The methodology developed in this work uses the meta data by combining different existing approaches to data integration. The results of the present work are validated in various case studies in industrial contexts and in international research projects.

[1]  Matthias Jarke,et al.  Fundamentals of Data Warehouses , 2000, Springer Berlin Heidelberg.

[2]  Arthur M. Keller,et al.  Updates to Relational Databases Through Views Involving Joins , 1982, International Conference on Data and Knowledge Bases.

[3]  Jarek Gryz,et al.  Query folding with inclusion dependencies , 1998, Proceedings 14th International Conference on Data Engineering.

[4]  Linda H. Rosenberg,et al.  A Software Quality Model and Metrics for Identifying Project Risks and Assessing Software Quality , 1996 .

[5]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[6]  James A. Hendler,et al.  DAML+OIL: An Ontology Language for the Semantic Web , 2002, IEEE Intell. Syst..

[7]  Ioana Manolescu,et al.  Answering XML Queries on Heterogeneous Data Sources , 2001, VLDB.

[8]  Matthias Jarke,et al.  DAIDA: an environment for evolving information systems , 1992, TOIS.

[9]  Markku Oivo,et al.  Adopting GQM-Based Measurement in an Industrial Environment , 1998, IEEE Softw..

[10]  C. Mohan A database perspective on Lotus Domino/Notes , 1999, SIGMOD '99.

[11]  Philip A. Bernstein Repositories and object oriented databases , 1998, SGMD.

[12]  François Goasdoué,et al.  Modeling Information Sources for Information Integration , 1999, EKAW.

[13]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[14]  Matthias Jarke,et al.  Data warehouse process management , 2001, Inf. Syst..

[15]  Diego Calvanese,et al.  Information integration: conceptual modeling and reasoning support , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[16]  Paolo Missier,et al.  Telcordia's Database Reconciliation and Data Quality Analysis Tool , 2000, VLDB.

[17]  W. Edwards Deming,et al.  Out of the Crisis , 1982 .

[18]  Theodore Johnson,et al.  Hunting of the Snark: Finding Data Glitches using Data Mining Methods , 1999, IQ.

[19]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[20]  Dan Suciu,et al.  Schema mediation in peer data management systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[21]  Colette Rolland,et al.  A Comprehensive View of Process Engineering , 1998, CAiSE.

[22]  Manfred A. Jeusfeld,et al.  Business data management for business-to-business electronic commerce , 2002, SGMD.

[23]  Giri Kumar Tayi,et al.  Examining data quality , 1998, CACM.

[24]  Matthias Jarke,et al.  Managing Multiple Requirements Perspectives with Metamodels , 1996, IEEE Softw..

[25]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[26]  Craig A. Knoblock,et al.  Query reformulation for dynamic information integration , 1996, Journal of Intelligent Information Systems.

[27]  P. Mouncey Improving Data Warehouse and Business Information Quality , 2001 .

[28]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[29]  Arie Segev Data Quality Challenges in Enabling eBusiness Transformation , 2001, IQ.

[30]  ShethAmit,et al.  An overview of workflow management , 1995 .

[31]  Matthias Jarke,et al.  ConceptBase — A deductive object base for meta data management , 1995, Journal of Intelligent Information Systems.

[32]  Matthias Jarke,et al.  DB-Prism: Integrated Data Warehouses and Knowledge Net- works for Bank Controlling , 2000 .

[33]  Markus Helfert,et al.  An Approach for Information Quality measurement in Data Warehousing , 2000 .

[34]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[35]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[36]  Enrico Franconi,et al.  The i.com tool for Intelligent Conceptual Modeling , 2000, KRDB.

[37]  Ahmed K. Elmagarmid,et al.  Automating the approximate record-matching process , 2000, Inf. Sci..

[38]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[39]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[40]  François Goasdoué,et al.  The Use of CARIN Language and Algorithms for Information Integration: The PICSEL System , 2000, Int. J. Cooperative Inf. Syst..

[41]  Christoph Quix,et al.  A Three-Phase Model of Electronic Marketplaces for Software Components in Chemical Engineering , 2001, I3E.

[42]  Diego Calvanese,et al.  Data Integration in Data Warehousing (Keynote Address) , 2001, CAiSE Workshops.

[43]  Martin Staudt,et al.  Metadata standards for data warehousing: open information model vs. common warehouse metadata , 2000, SGMD.

[44]  W. H. Inmon,et al.  Building the Operational Data Store , 1995 .

[45]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[46]  Matthias Jarke,et al.  Database Application Engineering with DAIDA , 1993, Research Reports ESPRIT.

[47]  Matthias Jarke,et al.  Telos: representing knowledge about information systems , 1990, TOIS.

[48]  Akhil Kumar,et al.  A dynamic warehouse for XML Data of the Web. , 2001 .

[49]  Mark Helfert,et al.  Eine empirische Untersuchung von Forschungsfragen beim Data Warehousing aus Sicht der Unternehmenspraxis , 2000 .

[50]  Diego Calvanese,et al.  Concept based design of data warehouses: the DWQ demonstrators , 2000, SIGMOD 2000.

[51]  Gustavo Alonso,et al.  Advanced transaction models in workflow contexts , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[52]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.

[53]  Verena Kantere,et al.  The hyperion project: from data integration to data coordination , 2003, SGMD.

[54]  Matthias Jarke,et al.  Architecture and Quality in Data Warehouses: An Extended Repository Approach , 1999, Information Systems.

[55]  Shengli Wu,et al.  GIMS-a data warehouse for storage and analysis of genome sequence and functional data , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[56]  Chen Li,et al.  Answering queries using views with arithmetic comparisons , 2002, PODS '02.

[57]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[58]  Diego Calvanese,et al.  A Principled Approach to Data Integration and Reconciliation in Data Warehousing , 1999, DMDW.

[59]  Stefano Spaccapietra,et al.  Model independent assertions for integration of heterogeneous schemas , 1992, The VLDB Journal.

[60]  Michael J. Maher,et al.  Rewriting General Conjunctive Queries Using Views , 2002, Australasian Database Conference.

[61]  Matthias Jarke,et al.  A software process data model for knowledge engineering in information systems , 1990, Inf. Syst..

[62]  S. Schwarz,et al.  Der ETL-Prozess des Data Warehousing , 2000 .

[63]  Oded Shmueli,et al.  Equivalence of DATALOG Queries is Undecidable , 1993, J. Log. Program..

[64]  Andrian Marcus,et al.  Data Cleansing: Beyond Integrity Analysis 1 , 2000 .

[65]  Matthias Jarke,et al.  Systematic Development of Data Mining-Based Data Quality Tools , 2003, VLDB.

[66]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[67]  Mokrane Bouzeghoub,et al.  Modeling the Data Warehouse Refreshment Process as a Workflow Application , 1999, DMDW.

[68]  Amit P. Sheth,et al.  An overview of workflow management: From process modeling to workflow automation infrastructure , 1995, Distributed and Parallel Databases.

[69]  Letizia Tanca,et al.  Logic Programming and Databases , 1990, Surveys in Computer Science.

[70]  Yoji Akao,et al.  Quality Function Deployment : Integrating Customer Requirements into Product Design , 1990 .

[71]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[72]  Oren Etzioni,et al.  Crossing the Structure Chasm , 2003, CIDR.

[73]  Varghese S. Jacob,et al.  Assessing data quality for information products , 1999, ICIS.

[74]  Diego Calvanese,et al.  Description Logic Framework for Information Integration , 1998, KR.

[75]  Richard Y. Wang,et al.  Data Quality , 2000, Advances in Database Systems.

[76]  Christoph Quix,et al.  Facilitating Business-to-Business Electronic Commerce for Small and Medium-Sized Enterprises , 2000, EC-Web.

[77]  Alon Y. Halevy,et al.  Recursive Query Plans for Data Integration , 2000, J. Log. Program..

[78]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[79]  John Mylopoulos,et al.  Understanding "why" in software process modelling, analysis, and design , 1994, Proceedings of 16th International Conference on Software Engineering.

[80]  Petra Schubert,et al.  Web assessment-measuring the effectiveness of electronic commerce sites going beyond traditional marketing paradigms , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[81]  Matthias Jarke,et al.  Data Warehouse Practice: An Overview , 2000 .

[82]  Martin Staudt,et al.  Metadata Management and Data Warehousing , 1999 .

[83]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[84]  Kenneth A. Ross,et al.  Adapting materialized views after redefinitions , 1995, SIGMOD '95.

[85]  Panos Vassiliadis,et al.  Towards Quality-oriented Data Warehouse Usage and Evolution , 2000, Inf. Syst..

[86]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[87]  Diego Calvanese,et al.  Dwq : Esprit Long Term Research Project, No 22469 on the Decidability of Query Containment under Constraints on the Decidability of Query Containment under Constraints , 2022 .

[88]  Holger Hinrichs,et al.  An ISO 9001: 2000 Compliant Quality Management System for Data Integration in Data Warehouse Systems , 2001, DMDW.

[89]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[90]  Tova Milo,et al.  Active Views for Electronic Commerce , 1999, VLDB.

[91]  Diego Calvanese,et al.  Description Logics for Information Integration , 2002, Computational Logic: Logic Programming and Beyond.

[92]  Carlo Batini,et al.  Data Quality in e-Business Applications , 2002, WES.

[93]  Zohra Bellahsene Schema Evolution in Data Warehouses , 2002, Knowledge and Information Systems.

[94]  Matthias Jarke,et al.  Incremental Maintenance of Externally Materialized Views , 1996, VLDB.

[95]  Wolfgang May,et al.  LoPiX: A System for XML Data Integration and Manipulation , 2001, VLDB.

[96]  Petra Schubert,et al.  Web assessment-a model for the evaluation and the assessment of successful electronic commerce applications , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[97]  Philip A. Bernstein,et al.  Meta-Data Support for Data Transformations Using Microsoft Repository , 1999, IEEE Data Eng. Bull..

[98]  Ioana Manolescu,et al.  Agora: Living with XML and Relational , 2000, VLDB.

[99]  Matthias Jarke,et al.  Distributed, Interoperable Workflow Support for Electronic Commerce , 1998, Trends in Distributed Systems for Electronic Commerce.

[100]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[101]  Harry Mucksch,et al.  Charakteristika, Komponenten und Organisationsformen von Data-Warehouses , 1996 .

[102]  Holger Hinrichs Datenqualitätsmanagement in Data-warehouse-Systemen , 2002 .

[103]  Divesh Srivastava,et al.  The Information Manifold , 1995 .

[104]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[105]  Prasenjit Mitra,et al.  An algorithm for answering queries efficiently using views , 2001, Proceedings 12th Australasian Database Conference. ADC 2001.

[106]  Felix Naumann,et al.  Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.

[107]  Philip A. Bernstein,et al.  Microsoft Repository Version 2 and the Open Information Model , 1999, Inf. Syst..

[108]  Sandro Morasca,et al.  Applying GQM in an industrial software factory , 1998, TSEM.

[109]  Birgitta König-Ries,et al.  An Approach to the Semi-Automatic Generation of Mediator Specifications , 2000, EDBT.

[110]  Matthias Jarke,et al.  Data warehouse architecture and quality model , 1997 .

[111]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[112]  Ian Horrocks,et al.  OIL in a Nutshell , 2000, EKAW.

[113]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[114]  Shari Lawrence Pfleeger,et al.  Software Engineering: The Production of Quality Software , 1987 .

[115]  Felix Naumann,et al.  Quality-driven Integration of Heterogenous Information Systems , 1999, VLDB.

[116]  Claude Delobel,et al.  Semantic integration in Xyleme: a uniform tree-based approach , 2003, Data Knowl. Eng..

[117]  Diego Calvanese,et al.  Semantic Data Integration in P2P Systems , 2003, DBISP2P.

[118]  Philip A. Bernstein,et al.  Merging Models Based on Given Correspondences , 2003, VLDB.

[119]  Diego Calvanese,et al.  Source integration in data warehousing , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[120]  Christoph Quix,et al.  Repository Support for Data Warehouse Evolution , 1999, DMDW.

[121]  Anca Vaduva,et al.  Metadata Management for Data Warehousing: An Overview , 2001, Int. J. Cooperative Inf. Syst..

[122]  Petra Schubert,et al.  The Extended Web Assessment Method (EWAM) applied: do websites for consumer goods stand the test?. , 2001 .

[123]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[124]  Michael R. Genesereth,et al.  Infomaster: an information integration system , 1997, SIGMOD '97.

[125]  Matthias Jarke,et al.  Design and Analysis of Quality Information for Data Warehouses , 1998, ER.

[126]  Divesh Srivastava,et al.  Data model and query evaluation in global information systems , 1995, Journal of Intelligent Information Systems.

[127]  Arthur M. Keller,et al.  Algorithms for translating view updates to database updates for views involving selections, projections, and joins , 1985, PODS.

[128]  A Min Tjoa,et al.  Process-Oriented Requirement Analysis Supporting the Data Warehouse Design Process - A Use Case Driven Approach , 2000, DEXA.

[129]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[130]  Volker Haarslev,et al.  Description Logics for the Semantic Web: Racer as a Basis for Building Agent Systems , 2003, Künstliche Intell..

[131]  Philip A. Bernstein,et al.  A vision for management of complex models , 2000, SGMD.

[132]  Ching-Lai Hwang,et al.  Multiple Attribute Decision Making: Methods and Applications - A State-of-the-Art Survey , 1981, Lecture Notes in Economics and Mathematical Systems.

[133]  Bernd Schneider,et al.  Einsatzpotentiale der KI im Electronic Commerce , 2001, Künstliche Intell..

[134]  Stuart E. Madnick,et al.  A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective , 1990, VLDB.

[135]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[136]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[137]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[138]  Matthias Jarke,et al.  Improving OLTP data quality using data warehouse mechanisms , 1999, SIGMOD '99.

[139]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[140]  V. Basili Software modeling and measurement: the Goal/Question/Metric paradigm , 1992 .

[141]  Steffen Staab,et al.  SWAP - Ontology-based Knowledge Management with Peer-to-Peer Technology , 2003, WOW.

[142]  Heiner Stuckenschmidt,et al.  Query Processing on the Semantic Web , 2003, Künstliche Intell..

[143]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[144]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[145]  Richard C. Morey,et al.  Estimating and improving the quality of information in a MIS , 1982, CACM.

[146]  Zohra Bellahsene Structural view maintenance in data warehousing systems , 1998, BDA.

[147]  Amit P. Sheth,et al.  The Carnot Heterogeneous Database Project: Implemented Applications , 1997, Distributed and Parallel Databases.

[148]  Richard Y. Wang,et al.  Toward quality data: An attribute-based approach , 2014, Decis. Support Syst..