An ETL Process for OLAP Using RDF/OWL Ontologies

In this paper, we present an advanced method for on-demand construction of OLAP cubes for ROLAP systems. The method contains the steps from cube design to ETL but focuses on ETL. Actual data analysis can then be done using the tools and methods of the OLAP software at hand. The method is based on RDF/OWL ontologies and design tools. The ontology serves as a basis for designing and creating the OLAP schema, its corresponding database tables, and finally populating the database. Our starting point is heterogeneous and distributed data sources that are eventually used to populate the OLAP cubes. Mapping between the source data and its OLAP form is done by converting the data first to RDF using ontology maps. Then the data are extracted from its RDF form by queries that are generated using the ontology of the OLAP schema. Finally, the extracted data are stored in the database tables and analysed using an OLAP software. Algorithms and examples are provided for all these steps. In our tests, we have used an open source OLAP implementation and a database server. The performance of the system is found satisfactory when testing with a data source of 450 000 RDF statements. We also propose an ontology based tool that will work as a user interface to the system, from design to actual analysis.

[1]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[2]  Mirina Grosz,et al.  World Wide Web Consortium , 2010 .

[3]  Günther Pernul,et al.  Ontology-based integration of OLAP and information retrieval , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[4]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[5]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[6]  Jean-Marc Pierson Data Management in Grids , 2008 .

[7]  Dimitrios Skoutas,et al.  Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data , 2007, Int. J. Semantic Web Inf. Syst..

[8]  Marko Niinimäki,et al.  Grid Resources, Services and Data ­ Towards a Semantic Grid System , 2006 .

[9]  Henrik Eriksson,et al.  The evolution of Protégé: an environment for knowledge-based systems development , 2003, Int. J. Hum. Comput. Stud..

[10]  E. F. Codd,et al.  Providing OLAP to User-Analysts: An IT Mandate , 1998 .

[11]  James A. Hendler,et al.  The Semantic Web — ISWC 2002 , 2002, Lecture Notes in Computer Science.

[12]  Kostas Kontogiannis,et al.  Semantic Web data description and discovery , 2003, Eleventh Annual International Workshop on Software Technology and Engineering Practice.

[13]  Alberto Abelló,et al.  Automating multidimensional design from ontologies , 2007, DOLAP '07.

[14]  Boris Vrdoljak,et al.  Designing Web Warehouses from XML Schemas , 2003, DaWaK.

[15]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[16]  Peter Thanisch,et al.  Normalising OLAP cubes for controlling sparsity , 2003, Data Knowl. Eng..

[17]  Dimitrios Skoutas,et al.  Designing ETL processes using semantic web technologies , 2006, DOLAP '06.

[18]  E. F. Codd,et al.  Further Normalization of the Data Base Relational Model , 1971, Research Report / RJ / IBM / San Jose, California.

[19]  Peter Thanisch,et al.  Constructing OLAP cubes based on queries , 2001, DOLAP '01.

[20]  Georg Gottlob,et al.  The complexity of XPath query evaluation , 2003, PODS.

[21]  David Maier,et al.  On the foundations of the universal relation model , 1984, TODS.

[22]  Torben Bach Pedersen,et al.  Specifying OLAP Cubes on XML Data , 2004, Journal of Intelligent Information Systems.

[23]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[24]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[25]  Yehoshua Sagiv,et al.  Can we use the universal instance assumption without using nulls? , 1981, SIGMOD '81.

[26]  Frank van Harmelen,et al.  Web Ontology Language , 2004 .

[27]  Jyrki Nummenmaa,et al.  Ontologies with Semantic Web/Grid in Data Integration for OLAP , 2007, Int. J. Semantic Web Inf. Syst..

[28]  Timo Niemi,et al.  A new measure of clustering effectiveness: Algorithms and experimental studies , 2008 .

[29]  Jeffrey M. Bradshaw,et al.  Applying KAoS Services to Ensure Policy Compliance for Semantic Web Services Workflow Composition and Enactment , 2004, SEMWEB.

[30]  Marcelo Arenas,et al.  Semantics and Complexity of SPARQL , 2006, International Semantic Web Conference.

[31]  Anthony Kosky,et al.  Semantics of Database Transformations , 1995, Semantics in Databases.

[32]  Domenico Talia,et al.  XML Data Integration in OGSA Grids , 2005, DMG.

[33]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[34]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[35]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[36]  Frank van Harmelen,et al.  Web Ontology Language: OWL , 2004, Handbook on Ontologies.

[37]  Mark Levene,et al.  Why is the snowflake schema a good data warehouse design? , 2003, Inf. Syst..

[38]  Andrew Rau-Chaplin,et al.  The OLAP-Enabled Grid: Model and Query Processing Algorithms , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).

[39]  Peter Fankhauser,et al.  XML data integration with OWL: experiences and challenges , 2004, 2004 International Symposium on Applications and the Internet. Proceedings..

[40]  A MusenMark,et al.  The evolution of Protgé , 2003 .

[41]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[42]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[43]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .