Ontologies in CLARIAH: Towards Interoperability in History, Language and Media

One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions, either by increasing the scale of scholarly studies, linking existing databases, or improving the accessibility of data. Here, the FAIR principles provide a useful framework as these state that data needs to be: Findable, as they are often scattered among various sources; Accessible, since some might be offline or behind paywalls; Interoperable, thus using standard knowledge representation formats and shared vocabularies; and Reusable, through adequate licensing and permissions. Integrating data from diverse humanities domains is not trivial, research questions such as "was economic wealth equally distributed in the 18th century?", or "what are narratives constructed around disruptive media events?") and preparation phases (e.g. data collection, knowledge organisation, cleaning) of scholars need to be taken into account. In this chapter, we describe the ontologies and tools developed and integrated in the Dutch national project CLARIAH to address these issues across datasets from three fundamental domains or "pillars" of the humanities (linguistics, social and economic history, and media studies) that have paradigmatic data representations (textual corpora, structured data, and multimedia). We summarise the lessons learnt from using such ontologies and tools in these domains from a generalisation and reusability perspective.

[1]  Lora Aroyo,et al.  From Tools to “Recipes”: Building a Media Suite within the Dutch Digital Humanities Infrastructure CLARIAH , 2017 .

[2]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[3]  Rinke Hoekstra,et al.  Integrating Diachronous Conceptual Lexicons through Linked Open Data , 2016 .

[4]  Rinke Hoekstra,et al.  grlc Makes GitHub Taste Like Linked Data APIs , 2016, SALAD@ESWC.

[5]  F. Collins,et al.  The Human Genome Project: Lessons from Large-Scale Biology , 2003, Science.

[6]  Jesse de Does,et al.  Diamonds in Borneo: Commodities as Concepts in Context , 2019, DATeCH.

[7]  Thomas Haigh We have never been digital , 2014, Commun. ACM.

[8]  Véronique Malaisé,et al.  Design and use of the Simple Event Model (SEM) , 2011, J. Web Semant..

[9]  Stefan Schlobach,et al.  CEDAR: The Dutch historical censuses as Linked Open Data , 2016, Semantic Web.

[10]  Philipp Cimiano,et al.  The OntoLex-Lemon Model: Development and Applications , 2017 .

[11]  A.P.J. van den Bosch,et al.  FoLiA in Practice. The Infrastructure of a Linguistic Annotation Format , 2017 .

[12]  Ineke Maas,et al.  The Construction of HISCAM: A Stratification Scale Based on Social Interactions for Historical Comparative Research , 2013 .

[13]  Albert Meroño-Peñuela,et al.  Linking Dutch Civil Certificates , 2020, WHiSe@ESWC.

[14]  Hennie Brugman,et al.  The Event-Detection GAP: Manual vs. automatic event detection in historical research , 2018 .

[15]  Deborah L. McGuinness,et al.  PROV-O: The PROV Ontology , 2013 .

[16]  M. de Rijke,et al.  Media studies research in the data‐driven age: How research questions evolve , 2016, J. Assoc. Inf. Sci. Technol..

[17]  Barbara Tillett,et al.  What is FRBR? A conceptual model for the bibliographic universe , 2005 .

[18]  S. Ruggles,et al.  The History of Quantification in History: The JIH as a Case Study , 2019, Journal of Interdisciplinary History.

[19]  Kevin R. Page,et al.  A linked research network that is Transforming Musicology , 2016, WHiSe@ESWC.

[20]  John A. Kunze,et al.  Dublin Core Metadata for Resource Discovery , 1998, RFC.

[21]  Dan Brickley,et al.  Schema.org: Evolution of Structured Data on the Web , 2015, ACM Queue.

[22]  Matthew L. Jockers,et al.  Text‐Mining the Humanities , 2015 .

[23]  M.P.M. van Horik,et al.  Twee eeuwen Nederland geteld , 2007 .

[24]  Antske Fokkens,et al.  NAF and GAF: Linking Linguistic Annotations , 2014 .

[25]  Huan Liu,et al.  Resource description framework: metadata and its applications , 2001, SKDD.

[26]  Rinke Hoekstra,et al.  The dataLegend Ecosystem for Historical Statistics , 2018, J. Web Semant..

[27]  Tommaso Di Noia,et al.  Linking data in digital libraries: the case of Puglia Digital Library , 2016, WHiSe@ESWC.

[28]  Timothy Clark,et al.  Open Annotation Data Model , 2013 .

[29]  Lora Aroyo,et al.  DIVE into the event-based browsing of linked historical media , 2015, J. Web Semant..

[30]  S. Schreibman,et al.  A new companion to digital humanities , 2016 .

[31]  Marieke van Erp,et al.  Towards Semantic Enrichment of Newspapers: A Historical Ecology Use Case , 2017, WHiSe@ISWC.

[32]  Enrico Daga,et al.  Proceedings of the 1st Workshop on Humanities in the Semantic Web co-located with 13th ESWC Conference 2016 (ESWC 2016) , 2016 .

[33]  Marco H. D. van Leeuwen,et al.  HISCO: Historical International Standard Classification of Occupations , 2002 .

[34]  Eero Hyvönen,et al.  Prosopographical Views to Finnish WW2 Casualties Through Cemeteries and Linked Open Data , 2017, WHiSe@ISWC.

[35]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[36]  Carole A. Goble,et al.  Common motifs in scientific workflows: An empirical analysis , 2012, 2012 IEEE 8th International Conference on E-Science.

[37]  Enrico Daga,et al.  WHiSe 2016 - Humanities in the Semantic Web , 2016 .

[38]  Dan Brickley,et al.  SKOS Core: Simple knowledge organisation for the Web , 2005, Dublin Core Conference.

[39]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[40]  Jens Lehmann,et al.  Integrating NLP Using Linked Data , 2013, SEMWEB.

[41]  María Poveda-Villalón,et al.  Linked Open Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web , 2016, Semantic Web.

[42]  Peter Elias,et al.  Occupational Classification (ISCO-88): Concepts, Methods, Reliability, Validity and Cross-National Comparability , 1997 .

[43]  R GruberThomas Toward principles for the design of ontologies used for knowledge sharing , 1995 .

[44]  Jean Véronis,et al.  Text Encoding Initiative: Background and Contexts , 1995 .

[45]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[46]  Paul Mulholland,et al.  Characterizing the Landscape of Musical Data on the Web: state of the art and challenges , 2017, WHiSe@ISWC.

[47]  Wouter Beek,et al.  nlGis: A Use Case in Linked Historic Geodata , 2018, SW4CH@ESWC.

[48]  Aldo Gangemi,et al.  Ontology Design Patterns , 2005 .

[49]  Graham Wilcock,et al.  Introduction to Linguistic Annotation and Text Analytics , 2009, Synthesis Lectures on Human Language Technologies.

[50]  Dan Brickley,et al.  Google Dataset Search: Building a search engine for datasets in an open Web ecosystem , 2019, WWW.

[51]  Valentine Charles,et al.  The Europeana Data Model (EDM) , 2010 .

[52]  Martin Doerr,et al.  The CIDOC Conceptual Reference Module: An Ontological Approach to Semantic Interoperability of Metadata , 2003, AI Mag..

[53]  Carsten Keßler,et al.  Querying and Integrating Spatial–Temporal Information on the Web of Data Via Time Geography , 2015, J. Web Semant..

[54]  Frank van Harmelen,et al.  Semantic technologies for historical research: A survey , 2014, Semantic Web.

[55]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.

[56]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[57]  Lora Aroyo,et al.  Enriching Media Collections for Event-Based Exploration , 2017, MTSR.

[58]  Rinke Hoekstra,et al.  Linked Humanities Data: The Next Frontier? A Case-study in Historical Census Data , 2012, LISC@ISWC.