The OpenAIRE Workflows for Data Management

The OpenAIRE initiative is the point of reference for Open Access in Europe and aims at the creation of an e-Infrastructure for the free flow, access, sharing, and re-use of research outcomes, services and processes for the advancement of research and the dissemination of scientific knowledge. OpenAIRE makes openly accessible a rich Information Space Graph (ISG) where products of the research life-cycle (e.g. publications, datasets, projects) are semantically linked to each other. Such an information space graph is constructed by a set of autonomic (orchestrated) workflows operating in a regimen of continuous data integration. This paper discusses the principal workflows operated by the OpenAIRE technical infrastructure in its different functional areas and provides the reader with the extent of the several challenges faced and the solutions realized.

[1]  Paolo Manghi,et al.  Information Inference in Scholarly Communication Infrastructures: The OpenAIREplus Project Experience , 2014, IRCDL.

[2]  Dominika Tkaczyk,et al.  CERMINE: automatic extraction of structured metadata from scientific literature , 2015, International Journal on Document Analysis and Recognition (IJDAR).

[3]  Natalia Manola,et al.  An Infrastructure for Managing EC Funded Research Output: The OpenAIRE Project , 2010 .

[4]  Herbert Van de Sompel,et al.  Resource Harvesting within the OAI-PMH Framework , 2004, D Lib Mag..

[5]  Andrew Borthwick,et al.  Dynamic Record Blocking: Efficient Linking of Massive Databases in MapReduce , 2012 .

[6]  Paolo Manghi,et al.  The OpenAIRE Literature Broker Service for Institutional Repositories , 2015, D Lib Mag..

[7]  Paolo Manghi,et al.  DataQ: A Data Flow Quality Monitoring System for Aggregative Data Infrastructures , 2016, TPDL.

[8]  Paolo Manghi,et al.  OpenAIREplus: the European Scholarly Communication Data Infrastructure , 2012, D Lib Mag..

[9]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[10]  Paolo Manghi,et al.  The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures , 2014, Program.

[11]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[12]  Pedro Príncipe,et al.  OpenAIRE guidelines for data source managers: aiming for metadata harmonization , 2015 .

[13]  John A. Kunze,et al.  Dublin Core Metadata for Resource Discovery , 1998, RFC.