The Nimble XML data integration system

For better or for worse, XML has emerged as a de facto standard for data interchange. This consensus is likely to lead to increased demand for technology that allows users to integrate data from a variety of applications, repositories, and partners, which are located across the corporate intranet or on the Internet. Nimble Technology has spent two years developing a product to service this market. Originally conceived after decades of person-years of research on data integration, the product is now being deployed at several Fortune-500 beta-customer sites. The article reports on the key challenges faced in the design of our product and highlights some issues which require more attention from the research community. In particular we address architectural issues arising from designing a product to support XML as its core representation, choices in the design of the underlying algebra, on-the-fly data cleaning and caching and materialization policies.

[1]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[2]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[3]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[4]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[5]  Timos K. Sellis,et al.  Dynamic Data Warehouse Design , 1999, DaWaK.

[6]  I. V. Ramakrishnan,et al.  A Rule-Based Data Standardizer for Enterprise Data Bases , 2001 .

[7]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[8]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[9]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[10]  Dennis Shasha,et al.  An extensible Framework for Data Cleaning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[11]  Klaus R. Dittrich,et al.  An overview and classification of mediated query systems , 1999, SGMD.

[12]  Alin Deutsch,et al.  A Query Language for XML , 1999, Comput. Networks.

[13]  Gang Zhou,et al.  A framework for supporting data integration using the materialized and virtual approaches , 1996, SIGMOD '96.

[14]  Joseph M. Hellerstein,et al.  An Interactive Framework for Data Cleaning , 2000 .

[15]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[16]  Patrick Valduriez,et al.  Scaling Access to Heterogeneous Data Sources with DISCO , 1998, IEEE Trans. Knowl. Data Eng..