Efficient OLAP query processing in distributed data warehouses

The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network includes complex data analysis that can often be expressed as OLAP queries. Current day OLAP tools assume the availability of the detailed data in a centralized warehouse. However, the inherently distributed nature of the data collection (e.g., flow-level traffic statistics are gathered at network routers) and the huge amount of data extracted at each collection point (of the order of several gigabytes per day for large IP networks) makes such an approach highly impractical. The natural solution to this problem is to maintain a distributed data warehouse, consisting of multiple local data warehouses (sites) adjacent to the collection points, together with a coordinator. In order for such a solution to make sense, we need a technology for distributed processing of complex OLAP queries. We have developed the Skalla system for this task. We conducted an experimental study of the Skalla evaluation scheme using TPC(R) data.

[1]  Anja Feldmann,et al.  Measurement and analysis of IP network usage and behavior , 2000, IEEE Commun. Mag..

[2]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[3]  Theodore Johnson,et al.  The MD-join: an operator for complex OLAP , 2001, Proceedings 17th International Conference on Data Engineering.