Data Warehousing: Design, Development and Best Practices

Data Warehousing: Design, Development and Best Practices By Soumendra Mohanty Tata McGraw Hill Publishing Company Limited, New Delhi 2006, Pages: 375; Price: Rs. 395; ISBN: 0-07-059920-3. INTRODUCTION There is plethora of information available on Data Warehousing (DW) methodology, technology and implementation but very few attempts have been made to synchronize them into a concise form. In this book, the author discusses DW lifecycle events end-to-end and each is strongly backed up by reallife experience dished out in the form of case studies, examples, diagrams and tables. The author, being a DW practitioner, has portrayed the reallife issues right from inception till implementation and how a consortium of business, technology and project management experts can overcome them. However, the unique feature of this book is its collection of the best practices which are tried and tested and which passed the test of time, place and complexity. The book is organized into three stages (i.e., Foundation, Consolidation and Advanced) divided in 15 Chapters. First five Chapters are dedicated to DW concepts, its framework and methodology and to discussing sample problems which DW can solve. Next seven Chapters deal with more advanced topics like best practices for Extract Transform and Load (ETL), dimensional modeling, metadata management, data qualitycontrol, testing, performance analysis and Returns on Investment (ROI) analysis. The book concludes with concepts of MOLAP, ROLAP realtime DW and how DW can play a significant role in Customer Relationship Management (CRM). FOUNDATION STAGE It introduces DW in the same line as W H Inmon had suggested, i.e., "a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision-making process". It explains the business situations where DW can be helpful and how this is different from operational or informational systems. Next the DW lifecycle is elaborated with emphasis on requirement management techniques, multilayered architecture and dimensional modeling. The key underlying principle is that entire DW initiative should be backed up by one or more strong business cases which is/are as precise and pointed that can be shared during a ride of an elevator-the 'elevator test'. The case study on Retail Banking Industry is used to substantiate the above points. However, some portion of Chapter 3 have been inadvertently copied from, Tojm Haughey's article, "Is Dimensional Modeling One of the Great Con Jobs |n Data Management History?" published in the DM Review Magazine, April 20CJ4 Issue, specially the topics on 'Helper Tables', "Multi-valued Dimensions!', 'Complex Hierarchies", etc. Reference to the original article would haye justified this. CONSOLIDATION STAGE During the next phase, the author tries to consolidate the theory of buildinjg a data warehouse with practicalities. Time is a dimension in almost all DW applications, but the modeling decision is not easy as there are many alternatives depending on the scenarios (e.g., Transaction Snapshot, Periodic Snapshot, Accumulating Snapshot, etc., for fact table extraction). Also it has beeji explained that a fact table can work as a dimension table and vice versa based on the situations. ETL process divided the entire process into a number of subprocesses, e.g., interface, trigger, audit, etc. The author rightly cautions practitioners of ignoring aspects like event management, performance and network monitoring, backup and recovery, and service level agreements while paying all attention towards facts, dimensions, ETLs, KPIs queries, etc. Data quality has been given a specia. emphasis in this work. The author has| drawn an analogy between software development and data management processes by explaining how Capability Mature Model (CMM) philosophy can be applied for both to measure the maturity level. The text is enriched with techniques and examples or data assessment, profiling, cleansing and integration. …