Denormalization effects on performance of relational database for data warehouse

The objective of normalization is to organize data into stable structures, and thereby minimize update anomalies and maximize data accessibility. Although normalization provides many benefits and is regarded as the rule for relational database design, there is at least one major drawback—poor system performance. In particular, the dual but related notions of data warehouses and decision support systems are two critical areas where denormalization issues are important, but have been largely ignored. Denormalization techniques have frequently been adopted in practice. However, denormalization lacks solid principles and guidelines, and thus remains a human-intensive process. There has been a paucity of research related to illustrating how denormalization enhances database performance and reduces query response time. The goal of this research is to provide comprehensive guidelines regarding when and how to effectively exercise denormalization strategies. In this study, four prevalent denormalization models have been identified based on an extensive literature review: Collapsing Tables, Partitioning a Table, Adding Redundant Columns, and Adding Derived Columns. The denormalization process is proposed as an intermediary step between logical and physical database design. After a review of journal articles and experts' recommendations in professional journals, several criteria have been identified for evaluating database designs and in particular denormalization strategies. These criteria are focused on reducing database access costs, and they are affected by database activity, computer system characteristics and physical factors. This research uses three approaches to evaluate denormalization strategies on the criteria outlined above. (1) Relational Algebra and Query Tree Analysis: This approach analyzes the operations on database. (2) Join Cost Function Analysis: This approach estimates costs of join operation in terms of access cost to secondary storage. (3)  Experiments with physical model: This approach analyzes the effects of denormalization in terms of related factors such as logical structure, database volume, and database volatility. The results show that there are positive effects of denormalization on database performance, and that four denormalization models are viable. The denormalization procedure is proposed as an intermediate step between logical and physical modeling. Two meaningful conclusions are derived from this study: (1) denormalization may offer positive effect's on performance of database for data warehouse, and (2) both analytical methods and the experiment results will offer an efficient way to assess the effects of denormalization. The guidelines and methodologies are sufficiently general so that they can be applicable to most databases.