Improving query response time in scientific databases using data aggregation -a case study

Although most state-of-the-art database systems have no inherent limitations w.r.t. the amount of data they can handle, the huge data quantities typically found in scientific database applications often exceed the feasibility level from a practical point of view when query performance is the issue. One theoretically well-known concept of improving query response time in scientific database applications is using the categorization and classification facilities often found in scientific computing domains for storing data aggregations that allow to substitute expensive access to raw data by the use of stored aggregated values. The results of an empirical performance study carried out in the application domain of market research are presented which substantiate the practical importance of such work. Using real market research data, it is shown that query response time can be shortened in an order of magnitude if a proper data aggregation concept is used. If the data aggregates are designed properly, the overhead of generating and managing materializations of data aggregates is by far outweighed by the improved query performance in realistic scenarios.