Interweaving OAI-PMH data sources with the linked data cloud

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) has found widespread adoption for exchanging bibliographic metadata. In parallel, the W3C's Linking Open Data Initiative exposes and interlinks structured data from a variety of data sources on the web. Since many of these data sources contain valuable information for institutional repositories (e.g., shared concept definitions, thesauri, etc.), we believe that institutions that currently expose their data via OAI-PMH can benefit if they integrate their metadata with the data available in the Linked Data cloud. To achieve such an integration, we must bridge the OAI-PMH-specific protocol characteristics that currently prevent OAI-PMH metadata from being interoperable with the Linked Data approach of exposing data. As first contribution of this paper, we describe a possible solution for exposing OAI-PMH metadata on the web as part of the Linked Data cloud. As a second contribution, we present a rule-based mechanism for linking these metadata with other relevant data sources together with a case study that describes possible linking scenarios for three representative OAI-PMH data providers. Finally, we discuss certain quality criteria that OAI-PMH metadata must meet to benefit from data exposed by other Linked Data sources.

[1]  Tim Berners-Lee,et al.  Linked data on the web (LDOW2008) , 2008, WWW.

[2]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[3]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Bernhard Haslhofer,et al.  The OAI2LOD Server: Exposing OAI-PMH Metadata as Linked Data , 2008, LDOW.

[6]  Antoine Isaac,et al.  LCSH, SKOS and Linked Data , 2008, Dublin Core Conference.

[7]  Carl Lagoze,et al.  The Open Archives Initiative Protocol for Metadata Harvesting Protocol , 2002 .

[8]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[9]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[10]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[11]  John A. Kunze,et al.  The Dublin Core Metadata Element Set , 2007, RFC.

[12]  Dmitri V. Kalashnikov,et al.  Domain-independent data cleaning via analysis of entity-relationship graph , 2006, TODS.

[13]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[14]  Ian Dickinson,et al.  Humboldt: Exploring Linked Data , 2008, LDOW.

[15]  C. Michael Sperberg-McQueen,et al.  World Wide Web Consortium , 2009, Encyclopedia of Database Systems.

[16]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.0 , 1996, RFC.

[17]  Jens Lehmann,et al.  Triplify: light-weight linked data publication from relational databases , 2009, WWW '09.

[18]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[19]  Roy T. Fielding,et al.  Uniform Resource Identifier (URI): Generic Syntax , 2005, RFC.

[20]  Martin Malmsten Making a Library Catalogue Part of the Semantic Web , 2009 .

[21]  Mark B. Sandler,et al.  Automatic Interlinking of Music Datasets on the Semantic Web , 2008, LDOW.

[22]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[23]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Xia Lin,et al.  International Conference on Dublin Core and Metadata Applications , 2008 .

[25]  M. Chadalapaka Network Working Group , 2002 .

[26]  Previous version: , 2004 .

[27]  Tom Heath,et al.  How to Publish Linked Data on the Web - Proposal for a Half-day Tutorial at ISWC2008 , 2008 .