A global and comprehensive approach for XML data warehouse design

The increasing amounts of interesting data stored in the XML format is the most challenging issue for BI community, thus it is desirable to successfully extract, store and integrate this large sources of information special purpose systems called “data warehouse” for further analysis and decision-making. However, compared with the well structured relational databases of a company, XML data presents a complex hierarchical structure, which renders inappropriate, existing traditional data warehouse approaches and techniques. In this paper, we propose a semi-automatic approach for XML data warehouse design starting from XML schemas as data sources. The first step consists in automatically generating the UML Class diagram from W3C XML Schema (XSD). However, the obtained diagram can be very large and hard to understand. To overcome this situation, we use a set of rules based on basic techniques for object oriented design quality to develop a simplification algorithm that efficiently generates high-quality diagrams with limited number of classes. Then, we propose a multi-dimensional (MD) element extraction algorithm to automatically identify facts, measures and their corresponding dimensions. We also present a new metric for ranking obtained MD schemas according to their relevance. The final step consists in automatically generating the star XML schema that corresponds to the XML Data warehouse schema. Finally, we have implemented our approach using JAVA and we have evaluated this tool on several XML schemas.

[1]  J. Wenny Rahayu,et al.  Conceptual and Systematic Design Approach for XML Document Warehouses , 2005, Int. J. Data Warehous. Min..

[2]  TrujilloJuan,et al.  An MDA Approach and QVT Transformations for the Integrated Development of Goal-Oriented Data Warehouses and Data Marts , 2011 .

[3]  Ji Zhang,et al.  X-warehouse: building query pattern-driven data , 2005, WWW '05.

[4]  Wolfgang Hümmer,et al.  XCube: XML for data warehouses , 2003, DOLAP '03.

[5]  Robert Winter,et al.  A method for demand-driven information requirements analysis in data warehousing projects , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[6]  Omar Boussaïd,et al.  X-Warehousing: An XML-Based Approach for Warehousing Complex Data , 2006, ADBIS.

[7]  Yu Li,et al.  Representing UML snowflake diagram from integrating XML data using XML schema , 2005, International Workshop on Data Engineering Issues in E-Commerce.

[8]  Hyoil Han,et al.  XML-OLAP: A Multidimensional Analysis Framework for XML Warehouses , 2005, DaWaK.

[9]  J. Wenny Rahayu,et al.  Conceptual Design of XML Document Warehouses , 2004, DaWaK.

[10]  Torben Bach Pedersen,et al.  Integrating Data Warehouses with Web Data: A Survey , 2008, IEEE Transactions on Knowledge and Data Engineering.

[11]  R. B. オ-ステンフェルド The Data Warehouse , 1997 .

[12]  Olivier Teste,et al.  Multidimensional Database Design from Document-Centric XML Documents , 2011, DaWaK.

[13]  Awais Rashid,et al.  XML Data Management: Native XML and XML-Enabled Database Systems , 2003 .

[14]  Olivier Teste,et al.  Finding an application-appropriate model for XML data warehouses , 2010, Inf. Syst..

[15]  Olivier Teste,et al.  Designing and Implementing OLAP Systems from XML Documents , 2009, New Trends in Data Warehousing and Data Analysis.

[16]  N. Parimala,et al.  From XML Schema to Cube , 2009 .

[17]  David Taniar,et al.  A Methodology for Building XML Data Warehouses , 2005, Int. J. Data Warehous. Min..

[18]  Peter Thanisch,et al.  Constructing an OLAP cube from distributed XML data , 2002, DOLAP '02.

[19]  Jaroslav Pokorný Modelling stars using XML , 2001, DOLAP '01.

[20]  Sabine Loudcher,et al.  Warehousing complex data from the web , 2008, Int. J. Web Eng. Technol..

[21]  Il-Yeol Song,et al.  Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications , 2004, J. Database Manag..

[22]  Boris Vrdoljak,et al.  Data warehouse design from XML sources , 2001, DOLAP '01.

[23]  Soumya Sen,et al.  A Framework to Convert XML Schema to ROLAP , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[24]  Jose-Norberto Mazón,et al.  An MDA approach for the development of data warehouses , 2008, Decis. Support Syst..

[25]  Torben Bach Pedersen,et al.  Converting XML DTDs to UML diagrams for conceptual data integration , 2001, Data Knowl. Eng..

[26]  Torben Bach Pedersen,et al.  Achieving adaptivity for OLAP-XML federations , 2003, DOLAP '03.

[27]  Jose-Norberto Mazón,et al.  An MDA Approach and QVT Transformations for the Integrated Development of Goal-Oriented Data Warehouses and Data Marts , 2011, J. Database Manag..

[28]  Boris Vrdoljak,et al.  Designing Web Warehouses from XML Sources , 2003 .

[29]  Ramón Zataraín-Cabada,et al.  A Mixed Approach for Data Warehouse Conceptual Design with MDA , 2008, ICCSA.

[30]  Tharam S. Dillon,et al.  Conceptual Design of an XML FACT Repository for Dispersed XML Document Warehouses and XML Marts , 2005, The Fifth International Conference on Computer and Information Technology (CIT'05).

[31]  Narayan C. Debnath,et al.  Graph semantic based design of XML Data Warehouse: A conceptual perspective , 2012, IEEE 10th International Conference on Industrial Informatics.

[32]  Peter Gluchowski,et al.  Computer-Aided Warehouse Engineering (CAWE): Leveraging MDA and ADM for the Development of Data Warehouses , 2010, AMCIS.

[33]  Rachid Chalal,et al.  An overview of XML warehouse design approaches and techniques , 2013, Int. J. Inf. Coding Theory.