XSD: A Hierarchical Access Method for Indexing XML Schemata

Search operations and browsing facilities over an XML document database require special support at the physical level. Typical search operations involve path queries. This paper proposes a hierarchical access method to support such operations and to facilitate browsing. It advocates the idea of searching large XML collections by administering efficiently XML schemata. The proposed approach may be used for indexing XML documents according to their structural proximity. This is obtained by organizing the schemata of a large XML document collection in a hierarchical way by merging structurally close schemata. The proposed structure, which is called XML Schema Directory (XSD), is a balanced tree and it may serve two purposes: (1) to accelerate XML query processing and (2) to facilitate browsing.

[1]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[2]  Jane Edmunds Information Sources , 1979, IEEE Engineering Management Review.

[3]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[4]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[5]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[6]  Kaizhong Zhang,et al.  Algorithms for the constrained editing distance between ordered labeled trees and related problems , 1995, Pattern Recognit..

[7]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[9]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[10]  Paolo Merialdo,et al.  To Weave the Web , 1997, VLDB.

[11]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[12]  Jon Bosak,et al.  XML, Java, and the Future of the Web , 1997, World Wide Web J..

[13]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[14]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[15]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[16]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[17]  Jennifer Widom,et al.  Indexing Semistructured Data , 1998 .

[18]  B. Adelberg NoDoSE - A Tool for Semi-Automatically Extracting Semi-Structured Data from Text Documents , 1998, SIGMOD Conference.

[19]  Brad Adelberg,et al.  NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents , 1998, SIGMOD '98.

[20]  Robert J. Glushko,et al.  XML and electronic commerce: enabling the network economy , 1998, SGMD.

[21]  David Schach,et al.  XML Query Language (XQL) , 1998, QL.

[22]  Serge Abiteboul,et al.  Extracting schema from semistructured data , 1998, SIGMOD '98.

[23]  Jennifer Widom Data Management for XML: Research Directions , 1999, IEEE Data Eng. Bull..

[24]  Elisa Bertino,et al.  An Approach to Classify Semi-structured Objects , 1999, ECOOP.

[25]  Arvind Malhotra,et al.  Xml schema part 2: datatypes , 1999 .

[26]  Robert J. Glushko,et al.  An XML framework for agent-based E-commerce , 1999, CACM.

[27]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[28]  Roy Goldman,et al.  From Semistructured Data to XML: Migrating the Lore Data Model and Query Language , 1999, Markup Lang..

[29]  John M. Boyer,et al.  XFDL: Creating Electronic Commerce Transaction Records Using XML , 1999, Comput. Networks.

[30]  Catriel Beeri,et al.  SAL: An Algebra for Semistructured Data and XML , 1999, WebDB.

[31]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[32]  Alin Deutsch,et al.  Querying XML Data , 1999, IEEE Data Eng. Bull..

[33]  Kaizhong Zhang,et al.  Evaluating a class of distance-mapping algorithms for data mining and clustering , 1999, KDD '99.

[34]  Yannis Papakonstantinou,et al.  Enhancing semistructured data mediators with document type definitions , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[35]  Vassilis Christophides,et al.  On wrapping query languages and efficient XML integration , 2000, SIGMOD '00.

[36]  Stefano Ceri,et al.  Comparative analysis of five XML query languages , 1999, SGMD.

[37]  David C. Fallside,et al.  Xml schema part 0: primer , 2000 .

[38]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[39]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[40]  John Mylopoulos,et al.  End-to-end E-commerce Application Development Based on XML Tools. , 2000 .

[41]  Stefano Paraboschi,et al.  XML: Current Developments and Future Challenges for the Database Community , 2000, EDBT.

[42]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[43]  Hugo Zaragoza,et al.  Information Retrieval: Algorithms and Heuristics , 2002, Information Retrieval.

[44]  Marc Najork,et al.  Mercator: A scalable, extensible Web crawler , 1999, World Wide Web.