A Web-based Mapping Technique for Establishing Metadata Interoperability

Die Integration von Metadaten aus unterschiedlichen, heterogenen Datenquellen erfordert Metadaten-Interoperabilitat, eine Eigenschaft die nicht standardmasig gegeben ist. Metadaten Mapping Verfahren ermoglichen es Domanenexperten Metadaten-Interoperabilitat in einem bestimmten Integrationskontext herzustellen. Mapping Losungen sollen dabei die notwendige Unterstutzung bieten. Wahrend diese fur den etablierten Bereich interoperabler Datenbanken bereits existieren, ist dies fur Web-Umgebungen nicht der Fall. Betrachtet man das Ausmas standig wachsender strukturierter Metadaten und Metadatenschemata im Web, so zeichnet sich ein Bedarf nach Web-basierten Mapping Losungen ab. Den Kern einer solchen Losung bildet ein Mappingmodell, das die zur Spezifikation von Mappings notwendigen Sprachkonstrukte definiert. Existierende Semantic Web Sprachen wie beispielsweise RDFS oder OWL bieten zwar grundlegende Mappingelemente (z.B.: owl:equivalentProperty, owl:sameAs), adressieren jedoch nicht das gesamte Sprektrum moglicher semantischer und struktureller Heterogenitaten, die zwischen unterschiedlichen, inkompatiblen Metadatenobjekten auftreten konnen. Auserdem fehlen technische Losungsansatze zur Uberfuhrung zuvor definierter Mappings in ausfuhrbare Abfragen. Als zentraler wissenschaftlicher Beitrag dieser Dissertation, wird ein abstraktes Mappingmodell prasentiert, welches das Mappingproblem auf generischer Ebene reflektiert und Losungsansatze zum Abgleich inkompatibler Schemata bietet. Instanztransformationsfunktionen und URIs nehmen in diesem Modell eine zentrale Rolle ein. Erstere uberbrucken ein breites Spektrum moglicher semantischer und struktureller Heterogenitaten, wahrend letztere das Mappingmodell in die Architektur des World Wide Webs einbinden. Auf einer konkreten, sprachspezifischen Ebene wird die Anbindung des abstrakten Modells an die RDF Vocabulary Description Language (RDFS) prasentiert, wodurch ein Mapping zwischen unterschiedlichen, in RDFS ausgedruckten Metadatenschemata ermoglicht wird. Das Mappingmodell ist in einen zyklischen Mappingprozess eingebunden, der die Anforderungen an Mappinglosungen in vier aufeinanderfolgende Phasen kategorisiert: mapping discovery, mapping representation, mapping execution und mapping maintenance. Im Rahmen dieser Dissertation beschaftigen wir uns hauptsachlich mit der Representation-Phase sowie mit der Transformation von Mappingspezifikationen in ausfuhrbare SPARQL-Abfragen. Zur Unterstutzung der Discovery-Phase bietet das Mappingmodell eine Schnittstelle zur Einbindung von Schema- oder Ontologymatching-Algorithmen. Fur die Maintenance-Phase prasentieren wir ein einfaches, aber seinen Zweck erfullendes Mapping-Registry Konzept. Auf Basis des Mappingmodells stellen wir eine Web-basierte Mediator-Wrapper Architektur vor, die Domanenexperten die Moglichkeit bietet, SPARQL-Mediationsschnittstellen zu definieren. Die zu integrierenden Datenquellen mussen dafur durch Wrapper-Komponenen gekapselt werden, welche die enthaltenen Metadaten im Web exponieren und SPARQL-Zugriff ermoglichen. Als beipielhafte Wrapper Komponente prasentieren wir den OAI2LOD Server, mit dessen Hilfe Datenquellen eingebunden werden konnen, die ihre Metadaten uber das Open Archives Initative Protocol for Metadata Harvesting (OAI-PMH) exponieren. Im Rahmen einer Fallstudie zeigen wir, wie Mappings in Web-Umgebungen erstellt werden konnen und wie unsere Mediator-Wrapper Architektur nach wenigen, einfachen Konfigurationsschritten Metadaten aus unterschiedlichen, heterogenen Datenquellen integrieren kann, ohne dass dadurch die Notwendigkeit entsteht, eine Mapping Losung in einer lokalen Systemumgebung zu installieren.

[1]  Chris Clifton,et al.  SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[2]  Yannis Papakonstantinou,et al.  Query rewriting for semistructured data , 1999, SIGMOD '99.

[3]  Laura M. Haas,et al.  Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.

[4]  Andy Powell,et al.  A Dublin Core Application Profile for Scholarly Works , 2007 .

[5]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[6]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[7]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[8]  DoanAnHai,et al.  Semantic-integration research in the database community , 2005 .

[9]  Barbara Tillett,et al.  What is FRBR? A conceptual model for the bibliographic universe , 2005 .

[10]  Mark A. Musen,et al.  Ontology versioning in an ontology management framework , 2004, IEEE Intelligent Systems.

[11]  Carl Lagoze,et al.  The Open Archives Initiative Protocol for Metadata Harvesting Protocol , 2002 .

[12]  Karl Aberer,et al.  GridVine: Building Internet-Scale Semantic Overlay Networks , 2004, SEMWEB.

[13]  Alon Y. Halevy,et al.  Efficient query reformulation in peer data management systems , 2004, SIGMOD '04.

[14]  Mary Czerwinski,et al.  Visualization of mappings between schemas , 2005, CHI.

[15]  Umeshwar Dayal,et al.  On the Updatability of Relational Views , 1978, VLDB.

[16]  Harald Kosch,et al.  Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21 , 2003 .

[17]  Andreas Tolk,et al.  What Comes After the Semantic Web - PADS Implications for the Dynamic Web , 2006, 20th Workshop on Principles of Advanced and Distributed Simulation (PADS'06).

[18]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[19]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[20]  Alon Y. Halevy,et al.  Enterprise information integration: successes, challenges and controversies , 2005, SIGMOD '05.

[21]  Luca Cardelli,et al.  On understanding types, data abstraction, and polymorphism , 1985, CSUR.

[22]  Amit P. Sheth,et al.  Semantic interoperability in global information systems , 1999, SGMD.

[23]  Bernhard Haslhofer,et al.  A Comparative Study of Mapping Solutions for Enabling Metadata Interoperability , 2008 .

[24]  Pedro M. Domingos,et al.  Representing and reasoning about mappings between domain models , 2002, AAAI/IAAI.

[25]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[26]  Wolfgang Klas,et al.  A survey of techniques for achieving metadata interoperability , 2010, CSUR.

[27]  Philip A. Bernstein,et al.  Industrial-strength schema matching , 2004, SGMD.

[28]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[29]  Eduardo Mena Nieto Observer: an approach for query processing in global information systems based on interoperation across pre-existing ontologies , 1999 .

[30]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[31]  Wolfgang Klas,et al.  An analysis of XML database solutions for the management of MPEG-7 media descriptions , 2003, CSUR.

[32]  Stefan Decker,et al.  TRIPLE - A Query, Inference, and Transformation Language for the Semantic Web , 2002, SEMWEB.

[33]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[34]  Dan Brickley,et al.  Principles of Metadata Registries A White Paper of the DELOS Working Group on Registries , 2003 .

[35]  Matthias Jarke,et al.  GeRoMe: A Generic Role Based Metamodel for Model Management , 2005, OTM Conferences.

[36]  Laura M. Haas,et al.  Capabilities-based query rewriting in mediator systems , 1996 .

[37]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[38]  Jörg Hoffmann,et al.  The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, June 1-5, 2008, Proceedings , 2008, ESWC.

[39]  Mark A. Musen,et al.  The PROMPT suite: interactive tools for ontology merging and mapping , 2003, Int. J. Hum. Comput. Stud..

[40]  Jayant Madhavan,et al.  Personal information management with SEMEX , 2005, SIGMOD '05.

[41]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[42]  Stefano Spaccapietra,et al.  Model independent assertions for integration of heterogeneous schemas , 1992, The VLDB Journal.

[43]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[44]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[45]  Trevor J. M. Bench-Capon,et al.  An Analysis of Ontology Mismatches; Heterogeneity versus Interoperability , 2007 .

[46]  Bernhard Haslhofer,et al.  Umfragereport zur Nutzung von Metadaten , 2008 .

[47]  Manjula Patel,et al.  Application Profiles: Mixing and Matching Metadata Schemas , 2000 .

[48]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[49]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[50]  Jane Hunter,et al.  Combining RDF and XML schemas to enhance interoperability between metadata application profiles , 2001, WWW '01.

[51]  Richard Cyganiak,et al.  A relational algebra for SPARQL , 2005 .

[52]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[53]  Marcia Lei Zeng,et al.  Metadata Interoperability and Standardization - A Study of Methodology, Part II: Achieving Interoperability at the Record and Repository Levels , 2006, D Lib Mag..

[54]  Previous version: , 2004 .

[55]  Karl Aberer,et al.  P-Grid: A Self-Organizing Access Structure for P2P Information Systems , 2001, CoopIS.

[56]  Stéphane Bressan,et al.  Context Interchange: New Features and Formalisms for the Intelligent Integration of Information Context Interchange: New Features and Formalisms for the Intelligent Integration of Information , 1997 .

[57]  Pavel Shvaiko,et al.  Community-Driven Ontology Matching , 2006, ESWC.

[58]  Philip A. Bernstein,et al.  A vision for management of complex models , 2000, SGMD.

[59]  Peter Fankhauser,et al.  XML data integration with OWL: experiences and challenges , 2004, 2004 International Symposium on Applications and the Internet. Proceedings..

[60]  Jeffrey D. Ullman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS '95.

[61]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[62]  Bernhard Haslhofer,et al.  The OAI2LOD Server: Exposing OAI-PMH Metadata as Linked Data , 2008, LDOW.

[63]  Todd D. Millstein,et al.  Query containment for data integration systems , 2003, J. Comput. Syst. Sci..

[64]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[65]  Anne Gilliland-Swetland,et al.  Introduction to Metadata: Pathways to Digital Information , 1998 .

[66]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[67]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[68]  Boris Motik,et al.  Bridging the gap between OWL and relational databases , 2009, J. Web Semant..

[69]  Bernhard Haslhofer,et al.  CIDOC CRM in Action - Experiences and Challenges , 2007, ECDL.

[70]  Boris Motik,et al.  MAFRA: an ontology mapping framework in the semantic web , 2002 .

[71]  Dan Suciu,et al.  Schema mediation for large-scale semantic data sharing , 2005, The VLDB Journal.

[72]  Laura M. Haas,et al.  The Clio project: managing heterogeneity , 2001, SGMD.

[73]  Bernhard Haslhofer,et al.  Metadata Management in a Heterogeneous Digital Library , 2005 .

[74]  Manjula Patel,et al.  What Terms Does Your Metadata Use? Application Profiles as Machine-Understandable Narratives , 2001, J. Digit. Inf..

[75]  John Mylopoulos,et al.  Data Sharing in the Hyperion Peer Database System , 2005, VLDB.

[76]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[77]  Holger Wache,et al.  Semantische Mediation für heterogene Informationsquellen , 2003, Künstliche Intell..

[78]  Andrew McParland TV-Anytime - Using All That Extra Data , 2002 .

[79]  Amit P. Sheth,et al.  Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media , 1998 .

[80]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[81]  Shiyong Lu,et al.  Efficient schema-based XML-to-Relational data mapping , 2007, Inf. Syst..

[82]  Klaus R. Dittrich,et al.  An overview and classification of mediated query systems , 1999, SGMD.

[83]  Shih-Fu Chang,et al.  MPEG-7 MDS Content Description Tools and Applications , 2001, CAIP.

[84]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[85]  Paul C. Miller,et al.  Interoperability: What is it and why should I want it? Ariadne 24 , 2000 .

[86]  Vladan Devedzic,et al.  Converting UML to OWL ontologies , 2004, WWW Alt. '04.

[87]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[88]  Gilad Bracha,et al.  Mirrors: design principles for meta-level facilities of object-oriented programming languages , 2004, OOPSLA.

[89]  Bernhard Haslhofer,et al.  A Service Oriented Approach for Integrating Metadata from Heterogeneous Digital Libraries , 2006 .

[90]  Michel C. A. Klein,et al.  Ontology Evolution: Not the Same as Schema Evolution , 2004, Knowledge and Information Systems.

[91]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[92]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[93]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[94]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[95]  Isabel F. Cruz,et al.  Ontology-based Query Rewriting in Peer-to-Peer Networks , 2006 .

[96]  Yannis Kalfoglou,et al.  Ontology mapping: the state of the art , 2003, The Knowledge Engineering Review.