Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web

XML has become a popular data interchange and storage format, which in recent times has precipitated the rise of XML-enabled relational databases as well as native XML databases. This paper outlines a data definition and manipulation language for XML repositories that enables users to perform data management tasks such as creation and deletion of indices, collections and documents. The language proposed also provides the ability to perform queries, transformations and updates on the documents in the XML repository either singly or across an entire collection. A syntax for the language is presented as extensions to the W3C’s XML Query language (XQuery) and also as a new language with syntax borrowed heavily from SQL for the relational model and DL/1 of IBM’s IMS system for the hierarchical model. A prototype implementation of the language has been partially completed.

[1]  Stuart E. Madnick,et al.  The Misguided Silver Bullet: What XML will and will NOT do to help Information Integration , 2001 .

[2]  Bill Serra,et al.  People, Places, Things: Web Presence for the Real World , 2002, Mob. Networks Appl..

[3]  Rakesh Agrawal Alpha: An extension of relational algebra to express a class of recursive queries , 1987, 1987 IEEE Third International Conference on Data Engineering.

[4]  Masako Takahashi,et al.  Generalizations of Regular Sets and Their Applicatin to a Study of Context-Free Languages , 1975, Inf. Control..

[5]  Arbee L. P. Chen,et al.  A probabilistic approach to query processing in heterogeneous database systems , 1992, [1992 Proceedings] Second International Workshop on Research Issues on Data Engineering: Transaction and Query Processing.

[6]  Stéphane Bressan,et al.  An Active Conceptual Model for Fixed Income Securities Analysis for Multiple Financial Institutions , 1998, ER.

[7]  Stuart E. Madnick,et al.  The inter-database instance identification problem in integrating autonomous systems , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[8]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[9]  Michael Kifer,et al.  Querying object-oriented databases , 1992, SIGMOD '92.

[10]  Jennifer Widom,et al.  The WHIPS prototype for data warehouse creation and maintenance , 1997, SIGMOD '97.

[11]  Mark A. Jones,et al.  Insight lab: an immersive team environment linking paper, displays, and data , 1998, CHI.

[12]  Andreas Renner XML data and object databases: the perfect couple? , 2001, Proceedings 17th International Conference on Data Engineering.

[13]  Sophie Cluet,et al.  Your mediators need data conversion! , 1998, SIGMOD '98.

[14]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[15]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[16]  Vipul Kashyap,et al.  Domain Specific Ontologies for Semantic Information Brokering on the Global Information Infrastructure , 1998 .

[17]  Bertram Ludäscher,et al.  Model-based mediation with domain maps , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Ioana Manolescu,et al.  Agora: Living with XML and Relational , 2000, VLDB.

[19]  Marja-Riitta Koivunen,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.

[20]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[21]  James L. Hein,et al.  Discrete structures, logic, and computability , 1994 .

[22]  Benjamin N. Grosof,et al.  An Approach to Using XML and a Rule-Based Content Language with an Agent Communication Language , 2000, Issues in Agent Communication.

[23]  Dongwon Lee,et al.  CPI: Constraints-Preserving Inlining algorithm for mapping XML DTD to relational schema , 2001, Data Knowl. Eng..

[24]  Benjamin C. Pierce,et al.  Xduce: a typed xml processing language , 1997 .

[25]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[26]  Erhard Rahm,et al.  Benchmarking XML Database Systems – First Experiences , 2001 .

[27]  David L. Hecht,et al.  Printed Embedded Data Graphical User Interfaces , 2001, Computer.

[28]  Beat Signer,et al.  OMS Java: Providing Information, Storage and Access Abstractions in an Object-Oriented Framework , 2001, OOIS.

[29]  Ahmad Ashari,et al.  Storing And Querying XML Data Using RDBMS , 2004, iiWAS.

[30]  Murali Mani,et al.  Taxonomy of XML schema languages using formal language theory , 2005, TOIT.

[31]  García-Castro Author index , 1999, British Journal of Cancer.

[32]  D. Levy Scrolling Forward: Making Sense of Documents in the Digital Age , 2001 .

[33]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[34]  Donald D. Chamberlin,et al.  XQuery: a query language for XML , 2003, SIGMOD '03.

[35]  Ioana Manolescu,et al.  The XML benchmark project , 2001 .

[36]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[37]  Elke A. Rundensteiner,et al.  Argos: Efficient Refresh in an XQL-Based Web Caching System , 2000, WebDB.

[38]  David J. DeWitt,et al.  The oo7 Benchmark , 1993, SIGMOD Conference.

[39]  Stuart E. Madnick,et al.  Working Paper Alfred P. Sloan School of Management Database Systems in a Dynamic Environment Database Systems in a Dynamic Environment Received Context Interchange: Overcoming the Challenges of Large-scale Interoperable Database Systems in a Dynamic Environment* , 2022 .

[40]  Andreas Steiner,et al.  A Model for Classification Structures with Evolution Control , 1996, ER.

[41]  Stuart E. Madnick,et al.  Meta-Data Jones and the Tower of Babel: The Challenge of Large-Scale Semantic Heterogeneity , 1999, MD.

[42]  Amit P. Sheth,et al.  Semantic Interoperability in Global Information Systems: A Brief Introduction to the Research Area a , 1999 .

[43]  Stuart E. Madnick,et al.  Improving the Quality of Corporate Household Data: Current Practices and Research Directions , 2001, IQ.

[44]  Ramana Rao,et al.  Bridging the paper and electronic worlds: the paper user interface , 1993, INTERCHI.

[45]  Pierre David Wellner,et al.  Interacting with paper on the DigitalDesk , 1993, CACM.

[46]  Dongwon Lee,et al.  NeT & CoT: translating relational schemas to XML schemas using semantic constraints , 2002, CIKM '02.

[47]  Dieter Fensel,et al.  Ontobroker: Ontology Based Access to Distributed and Semi-Structured Information , 1999, DS-8.

[48]  Deise de Brum Saccol Materialização de visões XML , 2001 .

[49]  Stéphane Bressan,et al.  XOO7: applying OO7 benchmark to XML query processing tool , 2001, CIKM '01.

[50]  Richard Fikes,et al.  The Ontolingua Server: a tool for collaborative ontology construction , 1997, Int. J. Hum. Comput. Stud..

[51]  Ke Wang,et al.  Discovering typical structures of documents: a road map approach , 1998, SIGIR '98.

[52]  James E. Morrow The University of Washington , 2004 .

[53]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[54]  Stéphane Bressan,et al.  XML BENCHMARKS PUT TO THE TEST , 2001 .

[55]  Gerhard Weikum,et al.  Adding Relevance to XML , 2000, WebDB.

[56]  Joe Leben,et al.  Ims Programming Techniques: A Guide to Using Dl/I , 1978 .

[57]  Oasis RELAX NG Specification , 2001 .

[58]  Chaitanya K. Baru,et al.  XML-based information mediation with MIX , 1999, SIGMOD '99.

[59]  Pavel Zezula,et al.  Object store with navigation accelerator , 1993, Inf. Syst..

[60]  Joseph Albert,et al.  Data integration in the RODIN multidatabase system , 1996, Proceedings First IFCIS International Conference on Cooperative Information Systems.

[61]  Lois M. L. Delcambre,et al.  Bundles in captivity: an application of superimposed information , 2001, Proceedings 17th International Conference on Data Engineering.

[62]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[63]  Aris M. Ouksel,et al.  Coordinating context building in heterogeneous information systems , 1994, Journal of Intelligent Information Systems.

[64]  Ioana Manolescu,et al.  Why and how to benchmark XML databases , 2001, SGMD.

[65]  Richard D. Hackathorn,et al.  Web Farming for the Data Warehouse , 1998 .

[66]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[67]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[68]  Lawrence A. Rowe,et al.  Distributed hierarchical storage manager for a video-on-demand system , 1994, Electronic Imaging.

[69]  Stéphane Bressan,et al.  The XOO7 XML Management System Benchmark , 2001 .

[70]  Jonathan Robie,et al.  Document Object Model (DOM) Level 2 Specification , 1998 .

[71]  David Schach,et al.  XML Query Language (XQL) , 1998, QL.

[72]  Alin Deutsch,et al.  Storing semistructured data with STORED , 1999, SIGMOD '99.

[73]  Arie Segev,et al.  Data manipulation in heterogeneous databases , 1991, SGMD.

[74]  Sihem Amer-Yahia,et al.  Tree Pattern Relaxation , 2002, EDBT.

[75]  David C. Fallside,et al.  Xml schema part 0: primer , 2000 .

[76]  Weimin Du,et al.  The Pegasus heterogeneous multidatabase system , 1991, Computer.

[77]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[78]  Dan Suciu,et al.  Typechecking for XML transformers , 2000, J. Comput. Syst. Sci..

[79]  Ioana Manolescu,et al.  Active XML: Peer-to-Peer Data and Web Services Integration , 2002, VLDB.

[80]  Shashi Shekhar,et al.  Resolving attribute incompatibility in database integration: an evidential reasoning approach , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[81]  T. Howes,et al.  Understanding and Deploying LDAP Directory Services , 2003 .

[82]  Pedro José Marrón,et al.  Efficient Cache Answerability for XPath Queries , 2002, EEXTT.

[83]  Wendy E. Mackay,et al.  Augmented reality: linking real and virtual worlds: a new paradigm for interacting with computers , 1998, AVI '98.

[84]  Laura Bright,et al.  A Wrapper Generation toolkit to specify and construct Wrappersfor Web Accessible Data Sources ( WebSources ) , 1999 .

[85]  Toshiyuki Amagasa,et al.  XRel: a path-based approach to storage and retrieval of XML documents using relational databases , 2001, ACM Trans. Internet Techn..

[86]  Jun-Hong Cui,et al.  Websigns: Hyperlinking Physical Locations to the Web , 2001, Computer.

[87]  Robert Laurini Spatial Multi-Database Topological Continuity and Indexing: A Step Towards Seamless GIS Data Interoperability , 1998, Int. J. Geogr. Inf. Sci..

[88]  Pekka Kilpeläinen,et al.  Tree Matching Problems with Applications to Structured Text Databases , 2022 .

[89]  Joann J. Ordille,et al.  The World Wide Web as a Collection of Views: Query Processing in the Information Manifold , 1996, VIEWS.

[90]  William Newman,et al.  A desk supporting computer-based interaction with paper documents , 1992, CHI.

[91]  Shahram Ghandeharizadeh,et al.  Staggered striping in multimedia information systems , 1994, SIGMOD '94.

[92]  Stuart E. Madnick,et al.  Context interchange: sharing the meaning of data , 1991, SGMD.

[93]  Dongwon Lee,et al.  Nesting-Based Relational-to-XML Schema Translation , 2001, International Workshop on the Web and Databases.

[94]  Juliana Freire,et al.  LegoDB: Customizing Relational Storage for XML Documents , 2002, VLDB.

[95]  Kenton O'Hara,et al.  A comparison of reading paper and on-line documents , 1997, CHI.

[96]  Arvind Malhotra,et al.  Xml schema part 2: datatypes , 1999 .

[97]  Werner Nutt,et al.  Queries with incomplete answers over semistructured data , 1999, PODS '99.

[98]  Dongwon Lee,et al.  Semantic Data Modeling Using XML Schemas , 2001, ER.

[99]  Steffen Staab,et al.  On2broker: Semantic-Based Access to Information Sources at the WWW , 1999, Intelligent Information Integration.

[100]  Chun-Nan Hsu,et al.  Induction of integrated view for XML data with heterogeneous DTDs , 2001, CIKM '01.

[101]  Stuart E. Madnick,et al.  Corporate Household Data: Research Directions , 2001 .

[102]  Michael J. Carey,et al.  XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents , 2000, VLDB.

[103]  Nick Roussopoulos,et al.  MOCHA: a self-extensible database middleware system for distributed data sources , 2000, SIGMOD '00.

[104]  LINDA G. DEMICHIEL,et al.  Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains , 1989, IEEE Trans. Knowl. Data Eng..

[105]  S. Boag,et al.  XQuery 1.0 : An XML query language, W3C Working Draft 12 November 2003 , 2003 .

[106]  Masatoshi Yoshikawa,et al.  Storage and Retrieval of XML Documents Using Object-Relational Databases , 1999, DEXA.

[107]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[108]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[109]  Martin Gogolla,et al.  Identifying Objects by Declarative Queries , 2000, Advances in Object-Oriented Data Modeling.

[110]  Laura M. Haas,et al.  Querying Multimedia Data from Multiple Repositories by Content: the Garlic Project , 1995, VDB.

[111]  Silvana Castano,et al.  Semantic integration of semistructured and structured data sources , 1999, SGMD.

[112]  Stéphane Bressan,et al.  Context Interchange: New Features and Formalisms for the Intelligent Integration of Information Context Interchange: New Features and Formalisms for the Intelligent Integration of Information , 1997 .

[113]  James A. Hendler,et al.  Dynamic Ontologies on the Web , 2000, AAAI/IAAI.

[114]  Juliana Freire,et al.  From XML schema to relations: a cost-based approach to XML storage , 2002, Proceedings 18th International Conference on Data Engineering.

[115]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[116]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[117]  Moira C. Norrie,et al.  A Modelling Approach to the Realisation of Modular Information Spaces , 2002, CAiSE.

[118]  Yannis Papakonstantinou,et al.  Query rewriting for semistructured data , 1999, SIGMOD '99.

[119]  Patrick Valduriez,et al.  Scaling Access to Heterogeneous Databases with DISCO , 1998 .

[120]  Kenneth A. De Jong,et al.  Evolving in a Changing World , 1999, ISMIS.

[121]  Shahram Ghandeharizadeh,et al.  On Configuring Hierarchical Storage Structures , 1998 .

[122]  Jennifer Widom,et al.  Query Optimization for XML , 1999, VLDB.

[123]  David J. DeWitt,et al.  The design and performance evaluation of alternative XML storage strategies , 2002, SGMD.

[124]  Scott E. Hudson,et al.  PaperLink: a technique for hyperlinking from real paper to electronic content , 1997, CHI.

[125]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[126]  Klaus Tochtermann,et al.  Using Semantic, Geographical, and Temporal Relationships to Enhance Search and Retrieval in Digital Catalogs , 1997, ECDL.

[127]  Michael Gertz,et al.  Annotating scientific images: a concept-based approach , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[128]  Moira C. Norrie,et al.  OMS Java: Lessons Learned from Building a Multi-Tier Object Management Framework , 1999, OOPSLA 1999.

[129]  Ge Yu,et al.  XBase: making your gigabyte disk queriable , 2002, SIGMOD '02.

[130]  Benjamin N. Grosof Standardizing XML Rules: Preliminary Outline of Invited Talk , 2001 .

[131]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[132]  Scott L. Minneman,et al.  Listen reader: an electronically augmented paper-based book , 2001, CHI.

[133]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[134]  Arnaud Le Hors,et al.  Document Object Model (DOM) Level 2 Core Specification - Version 1.0 , 2000 .

[135]  Luis Gravano,et al.  Evaluating top-k queries over web-accessible databases , 2004, TODS.

[136]  Beat Signer,et al.  Java Framework for Database-Centric Web Engineering , 2001, WWW 2001.

[137]  Dan Suciu,et al.  SilkRoute: trading between relations and XML , 2000, Comput. Networks.

[138]  Roderic G. G. Cattell The benchmark handbook for database and transaction processing systems , 1991 .

[139]  Yannis Papakonstantinou,et al.  Object Fusion in Mediator Systems , 1996, VLDB.

[140]  Christine Reid,et al.  The Myth of the Paperless Office , 2003, J. Documentation.

[141]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[142]  Denilson Barbosa,et al.  ToXgene: An extensible template-based data generator for XML , 2002, WebDB.

[143]  Jun Rekimoto,et al.  CyberCode: designing augmented reality environments with visual tags , 2000, DARE '00.

[144]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[145]  Roy Want,et al.  Bridging physical and virtual worlds with electronic tags , 1999, CHI '99.

[146]  Jaideep Srivastava,et al.  Entity identification in database integration , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[147]  Stefano Spaccapietra,et al.  On Spatial Database Integration , 1998, Int. J. Geogr. Inf. Sci..

[148]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[149]  Diego Calvanese,et al.  View-based query processing for regular path queries with inverse , 2000, PODS '00.

[150]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[151]  Umeshwar Dayal,et al.  Processing Queries Over Generalization Hierarchies in a Multidatabase System , 1983, VLDB.

[152]  Daniela Florescu,et al.  A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database , 1999 .

[153]  Pedro José Marrón,et al.  On Processing XML in LDAP , 2001, VLDB.

[154]  Xml Db Initiative XUpdate-XML Update Language , 2003 .

[155]  Surajit Chaudhuri,et al.  Materialized view and index selection tool for Microsoft SQL server 2000 , 2001, SIGMOD '01.

[156]  Torsten Schlieder Similarity Search in XML Data using Cost-Based Query Transformations , 2001, WebDB.

[157]  Ian Horrocks,et al.  Building a bioinformatics ontology using OIL , 2002, IEEE Transactions on Information Technology in Biomedicine.

[158]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[159]  Marc Dymetman,et al.  Intelligent Paper , 1998, Annual Conference on Evolutionary Programming.

[160]  Agnès Voisard,et al.  Spatial databases - with applications to GIS , 2002 .

[161]  Amar Gupta,et al.  A Methodology for Integration of Heterogeneous Databases , 1994, IEEE Trans. Knowl. Data Eng..

[162]  Stéphane Bressan,et al.  Efficient XML Data Management: An Analysis , 2002, EC-Web.

[163]  David J. DeWitt,et al.  The Niagara Internet Query System , 2001, IEEE Data Eng. Bull..

[164]  Kevin Knight,et al.  Toward Distributed Use of Large-Scale Ontologies t , 1997 .

[165]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[166]  Masatoshi Yoshikawa,et al.  ILOG: Declarative Creation and Manipulation of Object Identifiers , 1990, VLDB.

[167]  Ralph Krieger,et al.  Efficient Structure Oriented Storage of XML Documents Using ORDBMS , 2002, EEXTT.

[168]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[169]  Stefano Ceri,et al.  Comparative analysis of five XML query languages , 1999, SGMD.

[170]  Peter Robinson,et al.  Active Alice: Using Real Paper to Interact with Electronic Texts , 1998, EP.

[171]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[172]  Stuart E. Madnick,et al.  Representing and reasoning about semantic conflicts in heterogeneous information systems , 1997 .

[173]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[174]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[175]  Ioana Manolescu,et al.  Integrating Keyword Search into XML Query Processing , 2000, BDA.

[176]  Juliana Freire,et al.  StatiX: making XML count , 2002, SIGMOD '02.

[177]  Ee-Peng Lim,et al.  A Global Object Model for Accommodating Instance Heterogeneities , 1998, ER.

[178]  Roger King,et al.  Using Object Matching and Materialization to Integrate Heterogeneous Databases , 1999, CoopIS.

[179]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[180]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[181]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[182]  Pedro José Marrón Processing XML in LDAP and its application to caching , 2001 .

[183]  Barry Arons,et al.  The audio notebook: paper and pen interaction with structured speech , 2001, CHI.

[184]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[185]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[186]  Stéphane Bressan,et al.  Context Knowledge Representation and Reasoning in the Context Interchange System , 2015, Applied Intelligence.

[187]  Erhard Rahm,et al.  XMach-1: A Benchmark for XML Data Management , 2001, BTW.

[188]  Daniela Florescu,et al.  Storing and Querying XML Data using an RDMBS , 1999, IEEE Data Eng. Bull..

[189]  Peter M. Schwarz,et al.  The Rufus System: Information Organization for Semi-Structured Data , 1993, VLDB.