Araneus in the Era of XML

A large body of research has been recently motivated by the at emp to extend database manipulation techniques to data on the Web (see [13] for a survey). Most of these resear ch efforts – which range from the definition of Web query languages and the related optimizations, to sys tems for Web site development and management, and to integration techniques – started before XML was intro duced, and therefore have strived for a long time to handle the highly heterogeneous nature of HTML pages. In t he meanwhile, Web data sources have evolved from small, home-made collections of HTML pages into comple x platforms for distributed data access and application development, and XML promises to impose itself as a more appropriate format for this new breed of Web sites. XML brings data on the Web closer to databases, s ince, differently from HTML, it is based on a clean distinction between the way the data, its logical str uctu e (the DTD), and the chosen presentation (the stylesheet) are specified. By virtue of this, most of the earl y research proposals for data management on the Web are now being reconsidered in this new perspective (see, for a c llection of references, [4]). In this paper, we discuss the impact of XML on the research wor k c nducted in the last few years by our group in the framework of the A RANEUS project. ARANEUS started as an attempt to investigate the chances of re-applying traditional database concepts and abstractio ns, such as the ones of data-model and query language, to data on the Web. In this spirit, we have developed several t ools and techniques to handle both structured and semistructured data, in the Web style, as follows: (i) a data model calledADM for modeling Web documents and hypertexts [8];(ii) languages for wrapping [12, 16, 14] and querying [8, 17] Web s ites; (iii) tools and techniques for Web site design [9] and implementation [18]. An interesting question is how these tools and applications will work in the era of XML. In the following sections, we will try to answer to this question, by emphasiz ing how XML fits in this framework, how it is influencing our ideas and our way of thinking about Web data so urces, and the design choices needed to adapt our tools – originally conceived for HTML – to the management of XML data. However, a word of caution is needed here: although very popular, XML is still a new propos al, and there are very little (if any) real XMLbased applications on the Web. It is therefore quite difficul t to reason about the data management problems that will come with XML, since we still haven’t experienced them. Because of this, our development will be mainly informal; we will basically try to discuss some of the choice s and tradeoffs related to modeling (in Section 2), querying (in Section 3), and developing (in Section 4) XML da ta sources.

[1]  Heikki Mannila,et al.  A Structured Document Database System , 1990 .

[2]  Elke A. Rundensteiner,et al.  OQL_SERF: an ODMG implementation of the template-based schema evolution framework , 1998, CASCON.

[3]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[4]  Bertrand Meyer,et al.  Applying 'design by contract' , 1992, Computer.

[5]  Elke A. Rundensteiner,et al.  Re-usable ODMG-based Templates for Web View Generation and Restructuring , 1998, Workshop on Web Information and Data Management.

[6]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[7]  Barbara Lerner,et al.  A model for compound type changes encountered in schema evolution , 2000, TODS.

[8]  Jay Banerjee,et al.  Semantics and implementation of schema evolution in object-oriented databases , 1987, SIGMOD '87.

[9]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[10]  Valter Crescenzi,et al.  Grammars Have Exceptions , 1998, Inf. Syst..

[11]  Elke A. Rundensteiner,et al.  SERF: ODMG-based generic re-structuring facility , 1999, SIGMOD '99.

[12]  Paolo Merialdo,et al.  To Weave the Web , 1997, VLDB.

[13]  Frank Wm. Tompa,et al.  Text / Relational Database Management Systems: Harmonizing SQL and SGML , 1994, ADB.

[14]  Dan Suciu Managing Web data , 1999, SIGMOD '99.

[15]  Elke A. Rundensteiner,et al.  SERF: schema evolution through an extensible, re-usable and flexible framework , 1998, CIKM '98.

[16]  Philippe Brèche,et al.  Advanced Principles for Changing Schemas of Object Databases , 1996, CAiSE.

[17]  Anthony Kosky,et al.  WOL: a language for database transformations and constraints , 1997, Proceedings 13th International Conference on Data Engineering.

[18]  Paolo Merialdo,et al.  Design and Maintenance of Data-Intensive Web Sites , 1998, EDBT.

[19]  David Maier,et al.  The GemStone Data Management System , 1989, Object-Oriented Concepts, Databases, and Applications.

[20]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[21]  Valter Crescenzi,et al.  The (Short) Araneus Guide to Web-Site Development , 1999, WebDB.

[22]  Stéphane Grumbach,et al.  In Search of the Lost Schema , 1999, ICDT.