Specifying OLAP Cubes on XML Data

On-Line Analytical Processing (OLAP) enables analysts to gain insight about data through fast and interactive access to a variety of possible views on information, organized in a dimensional model. The demand for data integration is rapidly becoming larger as more and more information sources appear in modern enterprises. In the data warehousing approach, selected information is extracted in advance and stored in a repository, yielding good query performance. However, in many situations a logical (rather than physical) integration of data is preferable. Previous web-based data integration efforts have focused almost exclusively on the logical level of data models, creating a need for techniques focused on the conceptual level. Also, previous integration techniques for web-based data have not addressed the special needs of OLAP tools such as handling dimensions with hierarchies. Extensible Markup Language (XML) is fast becoming the new standard for data representation and exchange on the World Wide Web. The rapid emergence of XML data on the web, e.g., business-to-business (B2B) e-commerce, is making it necessary for OLAP and other data analysis tools to handle XML data as well as traditional data formats.Based on a real-world case study, this paper presents an approach to specification of OLAP DBs based on web data. Unlike previous work, this approach takes special OLAP issues such as dimension hierarchies and correct aggregation of data into account. Also, the approach works on the conceptual level, using Unified Modeling Language (UML) as a basis for so-called UML snowflake diagrams that precisely capture the multidimensional structure of the data. An integration architecture that allows the logical integration of XML and relational data sources for use by OLAP tools is also presented.

[1]  Michael Kay,et al.  Professional XML , 2000 .

[2]  G. M. Bierman Using XML as an Object Interchange Format , 2000 .

[3]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[4]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[5]  Laura M. Haas,et al.  The Garlic project , 1996, SIGMOD '96.

[6]  Michael Stonebraker,et al.  Independent, Open Enterprise Data Integration , 1999, IEEE Data Eng. Bull..

[7]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom , 1998 .

[8]  Torben Bach Pedersen,et al.  Specifying OLAP Cubes on XML Data , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[9]  Alan R. Simon,et al.  Understanding the New SQL: A Complete Guide , 1993 .

[10]  Arie Shoshani,et al.  Extending OLAP querying to external object databases , 2000, CIKM '00.

[11]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[12]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[13]  Sergei Arkhipentov,et al.  Oracle Express Olap , 2001 .

[14]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[15]  Arie Shoshani,et al.  STORM: A Statistical Object Representation Model , 1990, IEEE Data Eng. Bull..

[16]  Torben Bach Pedersen,et al.  Converting XML DTDs to UML diagrams for conceptual data integration , 2001, Data Knowl. Eng..

[17]  Dan Suciu,et al.  Declarative specification of Web sites with Strudel , 2000, The VLDB Journal.

[18]  Daniela Florescu,et al.  Quilt: An XML Query Language for Heterogeneous Data Sources , 2000, WebDB.

[19]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[20]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[21]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[22]  Serge Abiteboul,et al.  Tools for Data Translation and Integration , 1999, IEEE Data Eng. Bull..

[23]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 , 1993 .

[24]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[25]  Torben Bach Pedersen,et al.  Extending Practical Pre-Aggregation in On-Line Analytical Processing , 1999, VLDB.

[26]  Kevin Williams,et al.  Professional XML , 2001 .

[27]  Stefano Ceri,et al.  Comparative analysis of five XML query languages , 1999, SGMD.

[28]  Erick Thomsen,et al.  Microsoft? OLAP Solutions , 1999 .

[29]  Erik Thomsen,et al.  OLAP Solutions - Building Multidimensional Information Systems , 1997 .

[30]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[31]  Jennifer Widom,et al.  Ozone: Integrating Structured and Semistructured Data , 1999, DBPL.

[32]  Peter Gluchowski,et al.  Data Warehouse , 1997, Informatik-Spektrum.

[33]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[34]  CeriStefano,et al.  Comparative analysis of five XML query languages , 2000 .

[35]  Michael Stonebraker,et al.  Open enterprise data integration , 1999 .