Interoperability in an Infrastructure Enabling Multidisciplinary Research: The case of CLARIN

CLARIN is a European Research Infrastructure providing access to language resources and technologies for researchers in the humanities and social sciences. It supports the use and study of language data in general and aims to increase the potential for comparative research of cultural and societal phenomena across the boundaries of languages and disciplines, all in line with the European agenda for Open Science. Data infrastructures such as CLARIN have recently embarked on the emerging frameworks for the federation of infrastructural services, such as the European Open Science Cloud and the integration of services resulting from multidisciplinary collaboration in federated services for the wider SSH domain. In this paper we describe the interoperability requirements that arise through the existing ambitions and the emerging frameworks. The interoperability theme will be addressed at several levels, including organisation and ecosystem, design of workflow services, data curation, performance measurement and collaboration.

[1]  Daan Broeder,et al.  Building a Federation of Language Resource Repositories: the DAM-LR Project and its Continuation within CLARIN , 2008, LREC.

[2]  Andreas Witt,et al.  Multilingual language resources and interoperability , 2009, Lang. Resour. Evaluation.

[3]  Erhard W. Hinrichs,et al.  WebLicht: Web-Based LRT Services for German , 2010, ACL.

[4]  Dieter Van Uytvanck,et al.  Semantic metadata mapping in practice: the Virtual Language Observatory , 2012, LREC.

[5]  Bruno Cartoni,et al.  Using the Europarl corpus for cross-linguistic research , 2013 .

[6]  Erhard W. Hinrichs,et al.  The CLARIN Research Infrastructure: Resources and Tools for eHumanities Scholars , 2014, LREC.

[7]  Jan Odijk Discovering Resources in CLARIN: Problems and Suggestions for Solutions , 2014 .

[8]  Çagri Çöltekin Turkish NLP web services in the WebLicht environment , 2015 .

[9]  Claus Zinn,et al.  The CLARIN Language Resource Switchboard , 2016 .

[10]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[11]  Tanja Wissik,et al.  Discovering Resources in the VLO: A Pilot Study with Students of Translation Studies , 2017 .

[12]  Hanna Hedeland,et al.  Conversion and Annotation Web Services for Spoken Language Data in CLARIN , 2017, CLARIN Annual Conference.

[13]  Go Sugimoto,et al.  The Curation Module and Statistical Analysis on VLO Metadata Quality , 2017 .

[14]  Menzo Windhouwer,et al.  Component Metadata Infrastructure : Best Practices for CLARIN , 2017 .

[15]  L. Romary,et al.  EAD ODD: a solution for project-specific EAD schemes , 2018 .

[16]  Tomaz Erjavec,et al.  CLARIN's Key Resource Families , 2018, LREC.

[17]  Dieter Van Uytvanck,et al.  CLARIN: Towards FAIR and Responsible Data Science Using Language Resources , 2018, LREC.

[18]  Marjan Cugmas,et al.  Towards Key Performance Indicators of Research Infrastructures , 2019, ArXiv.