Converting XML DTDs to UML diagrams for conceptual data integration

Extensible Markup Language (XML) is fast becoming the new standard for data representation and exchange on the World Wide Web, e.g., in B2B e-commerce. Modern enterprises need to combine data from many sources in order to answer important business questions, creating a need for integration of web-based XML data. Previous web-based data integration efforts have focused almost exclusively on the logical level of data models, creating a need for techniques that focus on the conceptual level in order to communicate the structure and properties of the available data to users at a higher level of abstraction. The most widely used conceptual model at the moment is the Unified Modeling Language (UML).This paper presents algorithms for automatically constructing UML diagrams from XML DTDs, enabling fast and easy graphical browsing of XML data sources on the web. The algorithms capture important semantic properties of the XML data such as precise cardinalities and aggregation (containment) relationships between the data elements. As a motivating application, it is shown how the generated diagrams can be used for the conceptual design of data warehouses based on web data, and an integration architecture is presented. The choice of data warehouses and On-Line Analytical Processing as the motivating application is another distinguishing feature of the presented approach.

[1]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[2]  Michael Stonebraker,et al.  Independent, Open Enterprise Data Integration , 1999, IEEE Data Eng. Bull..

[3]  Vishu Krishnamurthy,et al.  Performance Challenges in Object-Relational DBMSs , 1999, IEEE Data Eng. Bull..

[4]  Renée J. Miller,et al.  The Use of Information Capacity in Schema Integration and Translation , 1993, VLDB.

[5]  Alexandra Poulovassilis,et al.  A Formalisation of Semantic Schema Integration , 1998, Inf. Syst..

[6]  Jennifer Widom,et al.  Ozone: Integrating Structured and Semistructured Data , 1999, DBPL.

[7]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[8]  Michael Kay,et al.  Professional XML , 2000 .

[9]  Kevin Williams,et al.  Professional XML , 2001 .

[10]  Alexandra Poulovassilis,et al.  A Semantic Approach to Integrating XML and Structured Data Sources , 2001, CAiSE.

[11]  Peter Gluchowski,et al.  Data Warehouse , 1997, Informatik-Spektrum.

[12]  Alexandra Poulovassilis,et al.  A Uniform Approach to Inter-model Transformations , 1999, CAiSE.

[13]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[14]  Laura M. Haas,et al.  The Garlic project , 1996, SIGMOD '96.

[15]  Serge Abiteboul,et al.  Tools for Data Translation and Integration , 1999, IEEE Data Eng. Bull..

[16]  Michael Stonebraker,et al.  Open enterprise data integration , 1999 .

[17]  Erik Thomsen,et al.  OLAP Solutions - Building Multidimensional Information Systems , 1997 .

[18]  Torben Bach Pedersen,et al.  Specifying OLAP Cubes on XML Data , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[19]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[20]  Dan Suciu,et al.  Declarative specification of Web sites with Strudel , 2000, The VLDB Journal.