A direct approach to physical Data Vault design

The paper presents a novel agile approach to large scale design of enterprise data warehouses based on a Data Vault model. An original, simple and direct algorithm is defined for the incremental design of physical Data Vault type enterprise data warehouses, using source data meta-model and rules, and used in developing a prototype case tool for Data Vault design. This approach solves primary requirements for a system of record, that is, preservation of all source information, and fully addresses flexibility and scalability expectations. Our approach benefits from Data Vault dependencies minimizations and rapid loads opportunities enabling greatly simplified ETL transformations in a way not possible with traditional (i.e. non data vault based) data warehouse designs. The approach is illustrated using a realistic example from the healthcare domain.

[1]  Orlando Belo,et al.  Pattern-Based ETL Conceptual Modelling , 2013, MEDI.

[2]  Lawrence Corr,et al.  Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema , 2011 .

[3]  Roland Bouman,et al.  Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration , 2010 .

[4]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[5]  Ralph Kimball,et al.  The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .

[6]  Imen Marsit,et al.  A New Data Warehouse Approach Using Graph , 2011, 2011 IEEE 8th International Conference on e-Business Engineering.

[7]  Craig Larman,et al.  Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development (3rd Edition) , 1997 .

[8]  Matteo Golfarelli,et al.  Data Warehouse Design: Modern Principles and Methodologies , 2009 .

[9]  Krish Krishnan,et al.  Building the Unstructured Data Warehouse , 2010 .

[10]  Matteo Golfarelli,et al.  WAND: A CASE Tool for Workload-Based Design of a Data Mart , 2002, SEBD.

[11]  Robert Winter,et al.  A method for demand-driven information requirements analysis in data warehousing projects , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[12]  Matteo Golfarelli Data Warehouse Life-Cycle and Design , 2009 .

[13]  Alberto Abelló,et al.  A Survey of Multidimensional Modeling Methodologies , 2009, Int. J. Data Warehous. Min..

[14]  Verónika Peralta,et al.  Towards the Automation of Data Warehouse Logical Design: a Rule-Based Approach , 2003, CAiSE Short Paper Proceedings.

[15]  Alberto Abelló,et al.  Automating multidimensional design from ontologies , 2007, DOLAP '07.

[16]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[17]  Vladan Jovanovic,et al.  Conceptual Data Vault Model , 2012 .

[18]  Thomas C. Hammergren Data Warehousing For Dummies , 1997 .

[19]  W. H. Inmon,et al.  Dw 2.0: The Architecture for the Next Generation of Data Warehousing , 2008 .

[20]  Paul Westerman What is Data Warehousing , 2000 .

[21]  Alberto Abelló,et al.  Requirement-Driven Creation and Deployment of Multidimensional and ETL Designs , 2012, ER Workshops.

[22]  Shiwei Tang,et al.  Triple-driven data modeling methodology in data warehousing: a case study , 2006, DOLAP '06.

[23]  Ralph Kimball,et al.  The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data , 2004 .

[24]  Achim Ulbrich-vom Ende,et al.  Business Process Oriented Development of Data Warehouse Structures , 2000 .

[25]  C. J. Date,et al.  Databases, types and the relational model : the third manifesto , 2007 .

[26]  Panos Vassiliadis,et al.  Conceptual modeling for ETL processes , 2002, DOLAP '02.

[27]  Panos Vassiliadis,et al.  Data Mapping Diagrams for Data Warehouse Design with UML , 2004, ER.

[28]  Matteo Golfarelli,et al.  Conceptual design of data warehouses from E/R schemes , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[29]  D. Krneta,et al.  Realization business intelligence in commerce using Microsoft Business Intelligence , 2008, 2008 6th International Symposium on Intelligent Systems and Informatics.

[30]  W. H. Inmon,et al.  Building the data warehouse (2nd ed.) , 1996 .

[31]  M. Jarke,et al.  Fundamentals of Data Warehouses , 2003, Springer Berlin Heidelberg.

[32]  Vladan Jovanovic,et al.  Persistent Staging Area Models for Data Warehouses , 2012 .

[33]  W. H. Inmon,et al.  Introduction to Data Vault Modeling , 2015 .

[34]  Karen C. Davis,et al.  Automating data warehouse conceptual schema design and evaluation , 2002, DMDW.

[35]  Michael Böhnlein,et al.  Deriving initial data warehouse structures from the conceptual data models of the underlying operational information systems , 1999, DOLAP '99.

[36]  Ramón Zataraín-Cabada,et al.  A MDA Tool for Data Warehouse , 2010, 2010 International Conference on Computational Science and Its Applications.

[37]  Matteo Golfarelli,et al.  QBX: A CASE Tool for Data Mart Design , 2011, ER Workshops.

[38]  Dragana Becejski-Vujaklija,et al.  A call detail records data mart: Data modeling and OLAP analysis , 2009, Comput. Sci. Inf. Syst..

[39]  Jose-Norberto Mazon,et al.  Systematic review and comparison of modeling ETL processes in data warehouse , 2010, 5th Iberian Conference on Information Systems and Technologies.

[40]  Panos Vassiliadis,et al.  A method for the mapping of conceptual designs to logical blueprints for ETL processes , 2008, Decis. Support Syst..

[41]  Gerhard Thonhauser,et al.  Multivariate Time Series Classification by Combining Trend-Based and Value-Based Approximations , 2012, ICCSA.

[42]  Ralph Kimball,et al.  The Microsoft Data Warehouse Toolkit: With SQL Server 2008 R2 and the Microsoft Business Intelligence Toolset , 2006 .

[43]  Vincent Rainardi,et al.  Building a Data Warehouse: With Examples in SQL Server , 2008 .

[44]  Shahrul Azman Mohd. Noah,et al.  Using Lexical Ontology for Semi-automatic Logical Data Warehouse Design , 2010, RSKT.

[45]  Panos Vassiliadis,et al.  A taxonomy of ETL activities , 2009, DOLAP.

[46]  Torben Bach Pedersen,et al.  Discovering Multidimensional Structure in Relational Data , 2004, DaWaK.

[47]  C. J. Date,et al.  Temporal data and the relational model : a detailed investigation into the application of interval and relation theory to the problem of temporal database management , 2002 .

[48]  Curtis Knowles 6NF Conceptual Models and Data Warehousing 2.0 , 2012 .

[49]  V. Milutinovic,et al.  Proceedings of the 23rd annual Hawaii International Conference on System Sciences , 1990 .

[50]  Sid Adelman,et al.  Data Strategy , 2005 .

[51]  정인기,et al.  [서평]「Applying UML and Patterns - An Introduction to Object-Oriented Analysis and Design」 , 1998 .