Virtualizing Heterogeneous Data Sources on the Grid - Design Concepts and Implementation

Virtualization is one of the key features of the Grid. As data-intensive applications gain on importance -on the commercial and scientific sector -it becomes crucial for the success of the Grid to provide transparent access to distributed/heterogeneous data sources as well. This paper presents the concepts, the prototype and performance results of the developed Grid Data Mediation Service within the GridMiner project conducted in Vienna, Austria. Various data sources like relational/XML databases and comma separated value files can be combined via an extendable and flexible Mapping Schema. User defined Java functions (static and dynamic) can be included in the mediation process to overcome all kinds of heterogeneities. To show the feasibility of the developed concepts and elaborate the basis for a dynamic version they have been seamlessly integrated into the Grid Data Service of OGSA-DAI to provide a non-proprietary, centralized and easy to integrate solution for virtualizing heterogeneous data sources on the Grid.. Although the prototype was developed within the GridMiner project, it is not limited to it and can be used in other projects to reduce effort and hide complexity when accessing diverse data sources comfortable with a subset of SQL.

[1]  J DeWittDavid,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989 .

[2]  Hai Zhuge,et al.  China's E-Science Knowledge Grid Environment , 2004, IEEE Intell. Syst..

[3]  Norman W. Paton,et al.  Grid Data Access and Integration in OGSA , 2002 .

[4]  Donald F. Ferguson,et al.  The WS-Resource Framework , 2004 .

[5]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[6]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[7]  Hector Garcia-Molina,et al.  Mediated query processing over autonomous data sources , 2001 .

[8]  Yannis Papakonstantinou,et al.  The Enosys Markets data integration platform: lessons from the trenches , 2001, CIKM '01.

[9]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[10]  Frank Leymann,et al.  Modeling Stateful Resources with Web Services , 2004 .

[11]  Mario Cannataro,et al.  PARALLEL AND DISTRIBUTED KNOWLEDGE DISCOVERY ON THE GRID: A REFERENCE ARCHITECTURE , 2000 .

[12]  Ivan Janciak,et al.  Knowledge Grid Support for Treatment of Traumatic Brain Injury Victims , 2003, ICCSA.

[13]  David W. Embley,et al.  Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.

[14]  Jignesh M. Patel,et al.  Structural joins: a primitive for efficient XML query pattern matching , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Laura M. Haas,et al.  Data integration through database federation , 2002, IBM Syst. J..

[16]  Vijayshankar Raman,et al.  Data Access and Management Services on Grid , 2002 .

[17]  Chaitanya K. Baru,et al.  XML-based information mediation with MIX , 1999, SIGMOD '99.

[18]  Paul Benjamin Lowry XML data mediation and collaboration: a proposed comprehensive architecture and query requirements for using XML to mediate heterogeneous data sources and targets , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[19]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[20]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[21]  Vanja Josifovski,et al.  Design, implementation and evaluation of a distributed mediator system for data integration , 1999 .

[22]  P ShethAmit,et al.  Federated database systems for managing distributed, heterogeneous, and autonomous databases , 1990 .

[23]  Alexandra Poulovassilis,et al.  Schema Evolution in Heterogeneous Database Architectures, A Schema Transformation Approach , 2002, CAiSE.

[24]  Ian Foster,et al.  The Globus toolkit , 1998 .

[25]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.

[26]  Peter Brezany,et al.  Mediators in the Architecture of Grid Information Systems , 2003, PPAM.

[27]  Chaitanya K. Baru,et al.  I2T: Information Integration Testbed for Digital Government , 2004, DG.O.

[28]  Sonia Sharama,et al.  Grid Computing , 2004, Lecture Notes in Computer Science.

[29]  Mario Cannataro,et al.  KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery , 2002 .

[30]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[31]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[32]  Reagan Moore,et al.  Virtualization Services for Data Grids , 2003 .

[33]  Terence Critchlow,et al.  Meta-data based mediator generation , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[34]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[35]  Mukesh K. Mohania,et al.  Policy Based Enterprise (Active) Information Integration , 2003, DEXA.

[36]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[37]  James A. Hendler,et al.  Applying Ontology to the Web: A Case Study , 1999, IWANN.

[38]  G Stix,et al.  The triumph of the light. , 2001, Scientific American.

[39]  William H. Bell,et al.  Project Spitfire - Towards Grid Web Service Databases , 2002 .

[40]  Pedro M. Domingos,et al.  Learning Source Description for Data Integration , 2000, WebDB.

[41]  Kishik Park,et al.  A design and implementation of XML-based Mediation Framework (XMF) for integration of Internet information resources , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.