Advanced Implementation Techniques for Scientific Data Warehouses

Data warehouses using a multidimensional view of data have become very popular in both business and science in recent years. Data warehouses for scientific purposes such as medicine and bio-chemistry1 pose several great challenges to existing data warehouse technology. Data warehouses usually use pre-aggregated data to ensure fast query response. However, pre-aggregation cannot be used in practice if the dimension structures or the relationships between facts and dimensions are irregular. A technique for overcoming this limitation and some experimental results are presented. Queries over scientific data warehouses often need to reference data that is external to the data warehouse, e.g., data that is too complex to be handled by current data warehouse technology, data that is ”owned” by other organizations, or data that is updated frequently. An example of this are the public genome databases such as Swissprot. This paper presents a federation architecture that allows the integration of multidimensional warehouse data with complex external data.

[1]  Arie Shoshani,et al.  OLAP++: Powerful and Easy-to-Use Federations of OLAP and Object Databases , 2000, VLDB.

[2]  Torben Bach Pedersen,et al.  Extending Practical Pre-Aggregation in On-Line Analytical Processing , 1999, VLDB.

[3]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[4]  Torben Bach Pedersen,et al.  Research issues in clinical data warehousing , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[5]  W. H. Inmon,et al.  Building the Data Warehouse,3rd Edition , 2002 .

[6]  Arie Shoshani,et al.  Extending OLAP querying to external object databases , 2000, CIKM '00.

[7]  Torben Bach Pedersen,et al.  Supporting imprecision in multidimensional databases using granularities , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[8]  Torben Bach Pedersen,et al.  The TreeScape System: Reuse of Pre-Computed Aggregates over Irregular OLAP Hierarchies , 2000, VLDB.

[9]  Jeffrey F. Naughton,et al.  Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies , 1996, VLDB.

[10]  Christian S. Jensen,et al.  A foundation for capturing and querying complex multidimensional data , 2001, Inf. Syst..

[11]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[12]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[13]  W. H. Inmon,et al.  Building the data warehouse (2nd ed.) , 1996 .

[14]  Torben Bach Pedersen,et al.  Multidimensional data modeling for complex data , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[15]  I-Min A Chen,et al.  An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools , 1995, Inf. Syst..