Physical Data Warehousing Design

Recently, organizations have increasingly emphasized applications in which current and historical data are analyzed and explored comprehensively, identifying useful trends and creating summaries of the data in order to support high-level decision making. Every organization keeps accumulating data from different functional units, so that they can be analyzed (after integration), and important decisions can be made from the analytical results. Conceptually, a data warehouse is extremely simple. As popularized by Inmon (1992), it is a “subject-oriented, integrated, time-invariant, nonupdatable collection of data used to support management decision-making processes and business intelligence”. A data warehouse is a repository into which are placed all data relevant to the management of an organization and from which emerge the information and knowledge needed to effectively manage the organization. This management can be done using data-mining techniques, comparisons of historical data, and trend analysis. For such analysis, it is vital that (1) data should be accurate, complete, consistent, well defined, and time-stamped for informational purposes; and (2) data should follow business rules and satisfy integrity constraints. Designing a data warehouse is a lengthy, time-consuming, and iterative process. Due to the interactive nature of a data warehouse application, having fast query response time is a critical performance goal. Therefore, the physical design of a warehouse gets the lion’s part of research done in the data warehousing area. Several techniques have been developed to meet the performance requirement of such an application, including materialized views, indexing techniques, partitioning and parallel processing, and so forth. Next, we briefly outline the architecture of a data warehousing system. BACKGROUND

[1]  Xiang Li,et al.  View Management Techniques and Their Application to Data Stream Management , 2010 .

[2]  Bharat K. Bhargava,et al.  PartJoin: An Efficient Storage and Query Execution for Data Warehouses , 2002, DaWaK.

[3]  Nilmini Wickramasinghe Incorporating the People Perspective into Data mining , 2005 .

[4]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[5]  Mukesh K. Mohania,et al.  Making Aggregate Views Self-maintainable , 2000, Data Knowl. Eng..

[6]  Honghua Dai,et al.  Inexact Field Learning Approach for Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[7]  Hans-Joachim Lenz,et al.  Tree Based Indexes Versus Bitmap Indexes: A Performance Study , 2001, Int. J. Cooperative Inf. Syst..

[8]  Kamalakar Karlapalem,et al.  On efficient storage space distribution among materialized views and indices in data warehousing environments , 2000, CIKM '00.

[9]  Philip Calvert,et al.  Encyclopedia of Data Warehousing and Mining , 2006 .

[10]  Mukesh K. Mohania,et al.  Bringing Together Partitioning, Materialized Views and Indexes to Optimize Performance of Relational Data Warehouses , 2004, DaWaK.

[11]  Pedro Nuno San-Banto Furtado Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions , 2009 .

[12]  Mukesh K. Mohania,et al.  What can partitioning do for your data warehouses and data marts? , 2000, Proceedings 2000 International Database Engineering and Applications Symposium (Cat. No.PR00789).

[13]  Frank Klawonn,et al.  Visualization of High-Dimensional Data with Polar Coordinates , 2009, Encyclopedia of Data Warehousing and Mining.

[14]  Shouhong Wang,et al.  Data Mining with Incomplete Data , 2009, Encyclopedia of Data Warehousing and Mining.

[15]  Yannis Manolopoulos,et al.  Robust Classification Based on Correlations Between Attributes , 2007, Int. J. Data Warehous. Min..