Data Integration Based on Data Conversion and Restructuring

Due to the development of the World Wide Web, the integration of heterogeneous data sources has become a major concern. Appropriate architectures and query languages have been proposed but the problem of data conversion remains largely unexplored. We present the YAT system for data conversion which provides tools for the speciication and the implementation of data conversions among heterogeneous data sources. It relies on a middleware model, a declarative language to described conversion/integration programs, a graphical representation for the language, and several mechanisms allowing to easily reuse existing programs. The model is based on named trees with ordered and labeled nodes. Like semistructured data models, it is simple enough to facilitate the representation of any data. The main originality of the model is its ability to capture various levels of representation. A YAT model can be instantiated into another, more speciic (eventually "ground"), model. This novel feature is essential rst for allowing customization of conversion/integration programs. Also, it is a key component of the type veriication that can be used to validate conversion programs. Finally, it is central to allow conversion programs to be combined (in parallel) or composed (sequentially) in a coherent manner. The YAT conversion language (YATL) is declarative, rule-based and features enhanced pattern matching facilities and powerful restructuring primitives. It allows to preserve or reconstruct the order of collections. We also present the architecture, implementation and practical use of the YAT prototype.

[1]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[2]  Yannis Papakonstantinou,et al.  Object Fusion in Mediator Systems , 1996, VLDB.

[3]  Dan Suciu,et al.  Programming Constructs for Unstructured Data , 1995, DBPL.

[4]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[5]  Serge Abiteboul,et al.  Restructuring Hierarchical Database Objects , 1988, Theor. Comput. Sci..

[6]  Laura M. Haas,et al.  Towards heterogeneous multimedia information systems: the Garlic approach , 1995, Proceedings RIDE-DOM'95. Fifth International Workshop on Research Issues in Data Engineering-Distributed Object Management.

[7]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[8]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[9]  Limsoon Wong,et al.  A query language for multidimensional arrays: design, implementation, and optimization techniques , 1996, SIGMOD '96.

[10]  Jeffrey D. Ullman,et al.  MedMaker: a mediation system based on declarative specifications , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[11]  Nick Roussopoulos,et al.  Interoperability of multiple autonomous databases , 1990, CSUR.

[12]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[13]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[14]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[15]  Anthony Kosky,et al.  WOL: a language for database transformations and constraints , 1997, Proceedings 13th International Conference on Data Engineering.

[16]  David Maier,et al.  Towards an effective calculus for object query languages , 1995, SIGMOD '95.

[17]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[18]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[19]  Serge Abiteboul,et al.  Correspondence and Translation for Heterogeneous Data , 1997, ICDT.

[20]  Michael Kifer,et al.  Querying object-oriented databases , 1992, SIGMOD '92.