A Web Odyssey: from Codd to XML

The Web presents the database area with vast opportunities and commensurate challenges. Databases and the Web are organically connected at many lev els. Web sites are increasingly pow ered b y databases.Collections of linked Web pages distributed across the Internet are themselves tempting targets for a database. The emergence of XML as the lingua franc a of the Web brings some m uchneeded order and will greatly facilitate the use of database techniques to manage Web information. This paper will discuss some of the developments related to the Web from the viewpoint of database theory. As we shall see, the Web scenario requires revisiting some of the basic assumptions of the area. T o be sure, database theory remains as valid as ev er in the classical setting, and the database industry will continue to represent a multi-billion dollar target of applicability for the foreseeable future. But the Web represents an opportunity of an entirely di erent scale. We are th us at an important juncture. Database theory could retain its classical focus and turn in w ards.Or, it could attempt to take heads-on the challenge of the Web and contribute to an important part of its formal foundations. T o do so, it will have to leave its familiar shores and reinvent itself. There are good signs that the journey has already begun. What makes theWeb scenario di erent from classical databases? In short, everything. A classical database is a coheren tly designed system. The system imposes rigid structure, and provides queries, updates, as well as transactions, concurrency, integrity, and recovery, in a con trolled environment. The Web escapes any suc h con trol.It is a freeevolving, ever-changing collection of data sources of various

[1]  Jeffrey D. Ullman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS '95.

[2]  Marcelo Arenas,et al.  A normal form for XML documents , 2002, PODS '02.

[3]  Dan Suciu,et al.  SilkRoute: trading between relations and XML , 2000, Comput. Networks.

[4]  Phokion G. Kolaitis,et al.  Conjunctive-query containment and constraint satisfaction , 1998, PODS.

[5]  Frank Neven,et al.  Automata theory for XML researchers , 2002, SGMD.

[6]  Wolfgang Thomas,et al.  Languages, Automata, and Logic , 1997, Handbook of Formal Languages.

[7]  Frank Neven,et al.  On the power of walking for querying tree-structured data , 2002, PODS.

[8]  Frank Neven,et al.  Structured Document Transformations Based on XSL , 1999, DBPL.

[9]  Benjamin C. Pierce,et al.  Regular expression types for XML , 2000, TOPL.

[10]  Helmut Seidl,et al.  Locating Matches of Tree Patterns in Forests , 1998, FSTTCS.

[11]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[12]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[13]  J W Ballard,et al.  Data on the web? , 1995, Science.

[14]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[15]  Yannis Papakonstantinou,et al.  DTD inference for views of XML data , 2000, PODS.

[16]  I. V. Ramakrishnan,et al.  A layered architecture for querying dynamic Web content , 1999, SIGMOD '99.

[17]  Noga Alon,et al.  XML with data values: typechecking revisited , 2001, PODS '01.

[18]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[19]  Frank Neven,et al.  Extensions of Attribute Grammars for Structured Document Queries , 1999, DBPL.

[20]  Dan Suciu,et al.  Catching the boat with Strudel: experiences with a Web-site management system , 1998, SIGMOD '98.

[21]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[22]  Vassilis Christophides,et al.  On wrapping query languages and efficient XML integration , 2000, SIGMOD 2000.

[23]  Peter Buneman,et al.  Path Constraints in Semistructured and Structured Databases. , 1998, PODS 1998.

[24]  Wenfei Fan,et al.  Integrity constraints for XML , 2000, PODS.

[25]  Serge Abiteboul,et al.  Monitoring XML data on the Web , 2001, SIGMOD '01.

[26]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[27]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[28]  Serge Abiteboul,et al.  Representing and querying XML with incomplete information , 2001, PODS '01.

[29]  Luca Cardelli,et al.  A Query Language Based on the Ambient Logic , 2001, SEBD.

[30]  Bertram Ludäscher,et al.  A Transducer-Based XML Query Processor , 2002, VLDB.

[31]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[32]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[33]  Jan Van den Bussche,et al.  Type inference in the polymorphic relational algebra , 1999, PODS '99.

[34]  David J. DeWitt,et al.  The Object-Oriented Database System Manifesto , 1994, Building an Object-Oriented Database System, The Story of O2.

[35]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[36]  Noga Alon,et al.  Typechecking XML views of relational databases , 2001, Proceedings 16th Annual IEEE Symposium on Logic in Computer Science.

[37]  Wenfei Fan,et al.  Path constraints on semistructured and structured data , 1998, PODS '98.

[38]  Chaitanya K. Baru,et al.  XML-based information mediation with MIX , 1999, SIGMOD '99.

[39]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[40]  Serge Abiteboul,et al.  Procedural and declarative database update languages , 1988, PODS '88.

[41]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.

[42]  Jan-Pascal van Best,et al.  Trips on Trees , 1999, Acta Cybern..

[43]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[44]  Michael Mortimer,et al.  On languages with two variables , 1975, Math. Log. Q..

[45]  Serge Abiteboul,et al.  Relational transducers for electronic commerce , 1998, J. Comput. Syst. Sci..

[46]  Diego Calvanese,et al.  View-based query processing and constraint satisfaction , 2000, Proceedings Fifteenth Annual IEEE Symposium on Logic in Computer Science (Cat. No.99CB36332).

[47]  Thomas Schwentick,et al.  Expressive and efficient pattern languages for tree-structured data (extended abstract) , 2000, PODS '00.

[48]  Catriel Beeri,et al.  Schemas for Integration and Translation of Structured and Semi-structured Data , 1999, ICDT.

[49]  Serge Abiteboul,et al.  Inferring structure in semistructured data , 1997, SGMD.

[50]  David Schach,et al.  XML Query Language (XQL) , 1998, QL.

[51]  Serge Abiteboul,et al.  Queries and computation on the web , 1997, Theor. Comput. Sci..

[52]  Wenfei Fan,et al.  Interaction between path and type constraints , 1999, PODS '99.

[53]  Kenneth A. Ross,et al.  The well-founded semantics for general logic programs , 1991, JACM.

[54]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[55]  Kyuseok Shim,et al.  Data mining and the Web: past, present and future , 1999, WIDM '99.

[56]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[57]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[58]  Diego Calvanese,et al.  Rewriting of regular expressions and regular path queries , 1999, PODS '99.

[59]  Ferenc Gécseg,et al.  Tree Languages , 1997, Handbook of Formal Languages.

[60]  Serge Abiteboul,et al.  Regular Path Queries with Constraints , 1999, J. Comput. Syst. Sci..

[61]  Thomas Schwentick,et al.  Query automata , 1999, PODS '99.

[62]  Christos H. Papadimitriou,et al.  Why not negation by fixpoint? , 1988, PODS '88.

[63]  Diego Calvanese,et al.  View-based query processing for regular path queries with inverse , 2000, PODS '00.

[64]  Jon M Kleinberg,et al.  Hubs, authorities, and communities , 1999, CSUR.

[65]  Jörg Flum,et al.  Finite model theory , 1995, Perspectives in Mathematical Logic.

[66]  Alfred V. Aho,et al.  Translations on a Context-Free Grammar , 1971, Inf. Control..

[67]  Serge Abiteboul,et al.  Query Subscription in an XML Webhouse , 2000, DELOS.

[68]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[69]  Derick Wood,et al.  Caterpillars: A Context Specification Technique , 2000, Markup languages.

[70]  Alberto O. Mendelzon,et al.  Research Issues in Structured and Semistructured Database Programming , 1999, Lecture Notes in Computer Science.

[71]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[72]  Thomas Schwentick On Diving in Trees , 2000, MFCS.

[73]  Maurizio Lenzerini,et al.  Description Logics and Their Relationships with Databases , 1999, ICDT.

[74]  Wenfei Fan,et al.  Constraints for semistructured data and XML , 2001, SGMD.

[75]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[76]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[77]  J. Büchi Weak Second‐Order Arithmetic and Finite Automata , 1960 .

[78]  Thomas Schwentick,et al.  Query automata over finite trees , 2002, Theor. Comput. Sci..

[79]  Neil Immerman,et al.  Relational Queries Computable in Polynomial Time , 1986, Inf. Control..

[80]  David Harel,et al.  Complexity Results for Two-Way and Multi-Pebble Automata and their Logics , 1996, Theor. Comput. Sci..

[81]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[82]  Daniela Florescu,et al.  Quilt: An XML Query Language for Heterogeneous Data Sources , 2000, WebDB.

[83]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[84]  Marc Spielmann Verification of relational tranducers for electronic commerce , 2000, PODS '00.

[85]  Frank Neven,et al.  Expressiveness of structured document query languages based on attribute grammars , 2002, J. ACM.

[86]  Jennifer Widom,et al.  Information translation, mediation, and mosaic-based browsing in the TSIMMIS system , 1995, SIGMOD '95.

[87]  Dan Suciu,et al.  Typechecking for XML transformers , 2000, J. Comput. Syst. Sci..

[88]  Victor Vianu,et al.  Validating streaming XML documents , 2002, PODS.

[89]  Nissim Francez,et al.  Finite-Memory Automata , 1994, Theor. Comput. Sci..

[90]  Benjamin C. Pierce,et al.  XDuce: A Typed XML Processing Language (Preliminary Report) , 2000, WebDB.

[91]  Wenfei Fan,et al.  On verifying consistency of XML specifications , 2002, PODS.

[92]  Alberto O. Mendelzon,et al.  Tableau Techniques for Querying Information Sources through Global Schemas , 1999, ICDT.

[93]  Phokion G. Kolaitis,et al.  On the Decision Problem for Two-Variable First-Order Logic , 1997, Bulletin of Symbolic Logic.

[94]  Alon Y. Halevy,et al.  Theory of answering queries using views , 2000, SGMD.

[95]  Thomas Schwentick,et al.  On the Power of Tree-Walking Automata , 2000, ICALP.

[96]  Derick Wood,et al.  Regular tree and regular hedge languages over unranked alphabets , 2001 .

[97]  Thomas Schwentick,et al.  Towards Regular Languages over Infinite Alphabets , 2001, MFCS.

[98]  Dongwon Lee,et al.  Comparative analysis of six XML schema languages , 2000, SGMD.

[99]  Frank Neven,et al.  A formal model for an expressive fragment of XSLT , 2000, Inf. Syst..

[100]  Dan Suciu The XML typechecking problem , 2002, SGMD.

[101]  Johan Anthory Willem Kamp,et al.  Tense logic and the theory of linear order , 1968 .

[102]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[103]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[104]  Dan Suciu,et al.  Programming Constructs for Unstructured Data , 1995, DBPL.