Bulletin of the Technical Committee on

This paper presents a brief overview of data management using the Extensible Markup Language (XML). It presents the basics of XML and the DTDs used to constrain XML data, and describes metadata management using RDF. It also discusses how XML data is queried, referenced, and transformed using stylesheet language XSLT and referencing mechanisms XPath and XPointer. 1 Describing XML Data The Extensible Markup Language (XML) [BPSM98] models data as a tree of elementsthat containcharacter dataand haveattributescomposed of name-value pairs. For example, here is an XML representation of catalog information for a book: <book> <title>The spy who came in from the cold</title> <author>John <lastname>Le Carre</lastname></author> <price currency="USD">5.59</price> <review><author>Ben</author>Perhaps one of the finest...</review> <review><author>Jerry</author>An intriguing tale of...</review> <bestseller authority="NY Times"/> </book> Text delimited by angle brackets ( <...>) is markup, while the rest ischaracter data. (Here, and in the rest of this paper, we introduce concepts informally as needed for our discussion; for formal specifications, see [W3C99].) Elements may contain a mix of character data and other elements; e.g., the book element contains the text “Here are some...” in addition to elements such as title andprice . The element named title contains character data denoting the book title and is contained in the book element. Similarly, the element price contains character data denoting the book’s price. This element also has an attribute named currency with valueUSD, represented using the syntax ttribute-name="attribute-value" within the element’s start-tag. In general, element names are not unique; e.g., the book element in our example contains two review elements. However, attribute names are unique within an element; e.g., the price element cannot have another attribute named currency. The syntax permits an empty element <bestseller></bestseller> to be represented more concisely as<bestseller/> . XML documents are called well-formedif they satisfy simple syntactic constraints, such as proper delimiting of element names and attributes and proper nesting of start and end tags. Copyright 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering

[1]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[2]  Rida A. Bazzi,et al.  Planar quorums , 1996, Theor. Comput. Sci..

[3]  John Mylopoulos,et al.  A generic integration architecture for cooperative information systems , 1996, Proceedings First IFCIS International Conference on Cooperative Information Systems.

[4]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[6]  Paolo Merialdo,et al.  Design and Maintenance of Data-Intensive Web Sites , 1998, EDBT.

[7]  Elke A. Rundensteiner,et al.  Aggregate path index for incremental Web view maintenance , 2000, Proceedings Second International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems. WECWIS 2000.

[8]  Micah Beck,et al.  The Internet2 Distributed Storage Infrastructure Project: An Architecture for Internet Content Channels , 1998, Comput. Networks.

[9]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[10]  M. Tamer Özsu Data management issues in electronic commerce , 1999, ACM SIGMOD Conference.

[11]  Achour Mostéfaoui,et al.  Consensus in asynchronous systems where processes can crash and recover , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[12]  Michel Raynal,et al.  Atomic broadcast in asynchronous crash-recovery distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[13]  John S. Heidemann,et al.  Resolving File Conflicts in the Ficus File System , 1994, USENIX Summer.

[14]  John S. Heidemann,et al.  Implementation of the Ficus Replicated File System , 1990, USENIX Summer.

[15]  Gustavo Alonso,et al.  Correctness and parallelism in composite systems , 1997, PODS.

[16]  Heikki Mannila,et al.  A Structured Document Database System , 1990 .

[17]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[18]  Calvin L. Williams,et al.  Modern Applied Statistics with S-Plus , 1997 .

[19]  Yair Amir,et al.  Evaluating quorum systems over the Internet , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[20]  Valter Crescenzi,et al.  The (Short) Araneus Guide to Web-Site Development , 1999, WebDB.

[21]  Philip S. Yu,et al.  On the merits of building categorization systems by supervised clustering , 1999, KDD '99.

[22]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[23]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[24]  Rachid Guerraoui Revistiting the Relationship Between Non-Blocking Atomic Commitment and Consensus , 1995, WDAG.

[25]  Hector Garcia-Molina,et al.  Consistency in a partitioned network: a survey , 1985, CSUR.

[26]  Serge Abiteboul,et al.  Relational transducers for electronic commerce , 1998, J. Comput. Syst. Sci..

[27]  S. S. Ravi,et al.  Deferred updates and data placement in distributed databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[28]  Marvin Theimer,et al.  Flexible update propagation for weakly consistent replication , 1997, SOSP.

[29]  John F. Roddick,et al.  A bibliography of temporal, spatial and spatio-temporal data mining research , 1999, SKDD.

[30]  Surajit Chaudhuri,et al.  AutoAdmin “what-if” index analysis utility , 1998, SIGMOD '98.

[31]  Bernd Matzke,et al.  ABAP/4: Programming the SAP R/3 System , 1999 .

[32]  Steffen Rothkugel,et al.  Enhancing the Web's Infrastructure: From Caching to Replication , 1997, IEEE Internet Comput..

[33]  Elke A. Rundensteiner,et al.  Re-usable ODMG-based Templates for Web View Generation and Restructuring , 1998, Workshop on Web Information and Data Management.

[34]  Michael Stonebraker,et al.  Inclusion of new types in relational data base systems , 1986, 1986 IEEE Second International Conference on Data Engineering.

[35]  Dan Suciu,et al.  Catching the boat with Strudel: experiences with a Web-site management system , 1998, SIGMOD '98.

[36]  Moira C. Norrie,et al.  Tool Agents in Coordinated Informations Systems , 1997, Inf. Syst..

[37]  Hector Garcia-Molina,et al.  The vulnerability of vote assignments , 1986, TOCS.

[38]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[39]  Avishai Wool,et al.  Replication, consistency, and practicality: are these mutually exclusive? , 1998, SIGMOD '98.

[40]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[41]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[42]  André Schiper,et al.  From group communication to transactions in distributed systems , 1996, CACM.

[43]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[44]  Dan Brickley,et al.  Resource description framework (RDF) schema specification , 1998 .

[45]  Tova Milo,et al.  Active Views for Electronic Commerce , 1999, VLDB.

[46]  D. Woolley The White Paper. , 1972, British medical journal.

[47]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[48]  Ellen W. Zegura,et al.  Application-layer anycasting , 1997, Proceedings of INFOCOM '97.

[49]  Rachid Guerraoui,et al.  The Decentralized Non-Blocking Atomic Commitment Protocol , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[50]  Abraham Silberschatz,et al.  Strategic directions in database systems—breaking out of the box , 1996, CSUR.

[51]  Charles T. Davies,et al.  Data Processing Spheres of Control , 1978, IBM Syst. J..

[52]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[53]  Jeffrey D. Ullman,et al.  Representative objects: concise representations of semistructured, hierarchical data , 1997, Proceedings 13th International Conference on Data Engineering.

[54]  Mahadev Satyanarayanan,et al.  Supporting application-specific resolution in an optimistically replicated file system , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[55]  Thomas Kistler,et al.  WebL - A Programming Language for the Web , 1998, Comput. Networks.

[56]  Divyakant Agrawal,et al.  Using broadcast primitives in replicated databases , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[57]  Liuba Shrira,et al.  Providing high availability using lazy replication , 1992, TOCS.

[58]  Charu C. Aggarwal,et al.  Data Mining Techniques for Personalization. , 2000 .

[59]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[60]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[61]  Guido Moerkotte,et al.  Querying documents in object databases , 1997, International Journal on Digital Libraries.

[62]  Alin Deutsch,et al.  Workshop on Query Processing for Semistructured Data and Non-Standard Data Formats , 1999 .

[63]  Hans-Jörg Schek,et al.  An open abstract-object storage system , 1996, SIGMOD '96.

[64]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[65]  G.J. Minden,et al.  A survey of active network research , 1997, IEEE Communications Magazine.

[66]  Michael Stonebraker,et al.  The Asilomar report on database research , 1998, SGMD.

[67]  Letizia Tanca,et al.  XML-GL: A Graphical Language for Querying and Restructuring XML Documents , 1999, SEBD.

[68]  Daniela Florescu,et al.  A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database , 1999 .

[69]  Hector Garcia-Molina,et al.  How to assign votes in a distributed system , 1985, JACM.

[70]  Rachid Guerraoui,et al.  Atomic Updates of Replicated Data , 1996, EDCC.

[71]  Sushil Jajodia,et al.  An adaptive data replication algorithm , 1997, TODS.

[72]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[73]  Brent Welch Customization and Flexibility in the exmh Mail User Interface , 1995, Tcl/Tk Workshop.

[74]  André Schiper Early consensus in an asynchronous system with a weak failure detector , 1997, Distributed Computing.

[75]  Jennifer Widom,et al.  Active Database Systems: Triggers and Rules For Advanced Database Processing , 1994 .

[76]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[77]  Bernadette Charron-Bost,et al.  Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes , 1996, WDAG.

[78]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[79]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[80]  Dan Suciu,et al.  A query language for a Web-site management system , 1997, SGMD.

[81]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[82]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[83]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[84]  Dan Suciu,et al.  An overview of semistructured data , 1998, SIGA.

[85]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[86]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[87]  Raghu Ramakrishnan,et al.  SRQL: Sorted Relational Query Language , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[88]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[89]  Gerhard Weikum,et al.  From the KERNEL to the COSMOS : The database research group at the ETH Zürich , 1990 .

[90]  Hans-Jörg Schek,et al.  From Extensible Databases to Interoperability between Multiple Databases and GIS Applications , 1993, SSD.

[91]  Irene Greif,et al.  Replicated document management in a group communication system , 1988, CSCW '88.

[92]  Ee-Peng Lim,et al.  On the Feasibility of Website Refresh Queries , 1999, DEXA.

[93]  Yair Amir,et al.  Transis: a communication subsystem for high availability , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[94]  David E. Culler,et al.  Using smart clients to build scalable services , 1997 .

[95]  Ee-Peng Lim,et al.  Querying structured Web resources , 1998, DL '98.

[96]  Yair Amir,et al.  Optimal Availability Quorum Systems: Theory and Practice , 1998, Inf. Process. Lett..

[97]  Gustavo Alonso,et al.  A suite of database replication protocols based on group communication primitives , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[98]  Abraham Silberschatz,et al.  Concurrency control in hierarchical multidatabase systems , 1997, The VLDB Journal.

[99]  Henry F. Korth,et al.  Replication and consistency: being lazy helps sometimes , 1997, PODS.

[100]  Richard T. Snodgrass,et al.  The TSQL2 Temporal Query Language , 1995 .

[101]  Narain H. Gehani,et al.  Scalable Update Propagation in Epidemic Replicated Databases , 1996, EDBT.

[102]  Mahadev Satyanarayanan,et al.  Translucent cache management for mobile computing , 1998 .

[103]  Martin L. Kersten,et al.  Flattening an object algebra to provide performance , 1998, Proceedings 14th International Conference on Data Engineering.

[104]  not Cwi,et al.  XHTML™ 1.0 The Extensible HyperText Markup Language , 2002 .

[105]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[106]  Roy Friedman,et al.  Failure detectors in omission failure environments , 1997, PODC '97.

[107]  R. Rajaraman,et al.  Dynamic Replication on the Internet , 1998 .

[108]  Michel Raynal,et al.  Probabilistic analysis of a group failure detection protocol , 1999, 1999 Proceedings. Fourth International Workshop on Object-Oriented Real-Time Dependable Systems.

[109]  Hans-Jörg Schek,et al.  Cooperation between Autonomous Operation Services and Object Database Systems in a Heterogeneous Environment , 1992, DS-5.

[110]  André Schiper,et al.  Stubborn Communication Channels , 1998 .

[111]  Lilian Hobbs,et al.  Rdb/VMS A Comprehensive Guide , 1991 .

[112]  Rachid Guerraoui,et al.  Exploiting Atomic Broadcast in Replicated Databases , 1998, Euro-Par.

[113]  Heiko Schuldt,et al.  Coordination in CIM:Bringing Database Functionality to Application Systems , 1998 .

[114]  Michael Stonebraker,et al.  Object-Relational DBMSs: Tracking the Next Great Wave , 1998 .

[115]  Divyakant Agrawal,et al.  An efficient and fault-tolerant solution for distributed mutual exclusion , 1991, TOCS.

[116]  Paolo Merialdo,et al.  To Weave the Web , 1997, VLDB.

[117]  Frank Wm. Tompa,et al.  Text / Relational Database Management Systems: Harmonizing SQL and SGML , 1994, ADB.

[118]  Roy Goldman,et al.  Views for Semistructured Data , 1997 .

[119]  Ee-Peng Lim,et al.  Calling all agents, where are you? , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[120]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[121]  Amr El Abbadi,et al.  Maintaining availability in partitioned replicated databases , 1987, ACM Trans. Database Syst..

[122]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 2000, Distributed Computing.

[123]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[124]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[125]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[126]  Stéphane Grumbach,et al.  In Search of the Lost Schema , 1999, ICDT.

[127]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[128]  Hans-Jörg Schek,et al.  Buffering Long Externally-Defined Objects , 1994, POS.

[129]  S. B. Yao,et al.  Efficient locking for concurrent operations on B-trees , 1981, TODS.

[130]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[131]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .

[132]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[133]  MICHEL HURFIN,et al.  FAST ASYNCHRONOUS CONSENSUS BASED ON A WEAK FAILURE DETECTOR , 1997 .

[134]  Valter Crescenzi,et al.  Grammars Have Exceptions , 1998, Inf. Syst..

[135]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[136]  Achour Mostéfaoui,et al.  Fault-tolerant Total Order Multicast to asynchronous groups , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[137]  Florian Matthes,et al.  SAP R/3: A Database Application System (Tutorial) , 1998, SIGMOD Conference.

[138]  Reudiger Buck-Emden,et al.  Sap R/3 System: A Client/Server Technology , 1996 .

[139]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[140]  Pattie Maes,et al.  Agents that buy and sell , 1999, CACM.

[141]  Maria Ebling,et al.  Exploiting weak connectivity for mobile file access , 1995, SOSP.

[142]  Michael Stonebraker,et al.  Database research: achievements and opportunities into the 1st century , 1996, SGMD.

[143]  Hans-Jörg Schek,et al.  Physical Database Design for Raster Images in CONCERT , 1997, SSD.

[144]  Alfons Kemper,et al.  Database performance in the real world: TPC-D and SAP R/3 , 1997, SIGMOD '97.

[145]  Richard A. Golding A Weak-Consistency Architecture for Distributed Information Services , 1992, Comput. Syst..

[146]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.