A study of semantic integration across archaeological data and reports in different languages

This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via natural language processing (NLP) across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, samples taken and wooden objects including shipwrecks. The outcomes are an integrated RDF dataset coupled with an associated interactive research demonstrator query builder application. The semantic framework combines the CIDOC Conceptual Reference Model (CRM) with the Getty Art and Architecture Thesaurus (AAT). The NLP, data cleansing and integration methods are described in detail together with illustrative scenarios from the web application Demonstrator. Reflections and recommendations from the study are discussed. The Demonstrator is a novel SPARQL web application, with CRM/AAT-based data integration. Functionality includes the combination of free text and semantic search with browsing on semantic links, hierarchical and associative relationship thesaurus query expansion. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and specialised associative relationships. Following a ‘mapping pattern’ approach (via the STELETO tool) ensured validity and consistency of all RDF output. The user is shielded from the complexity of the underlying semantic framework by a query builder user interface. The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information. The semantic linking of textual reports and datasets opens new possibilities for integrative research across diverse resources.

[1]  Leif Isaksen,et al.  The Pleiades Gazetteer and the Pelagios Project , 2016 .

[2]  Julian D. Richards,et al.  Stepping back from the trench edge: an archaeological perspective on the devleopment of standards for recording and publication , 2008 .

[3]  Paul J Cripps,et al.  Ontological Modelling of the work of the Centre for Archaeology , 2005 .

[4]  Antoine Isaac,et al.  Library Linked Data Incubator Group: Datasets, Value Vocabularies, and Metadata Element Sets. , 2011 .

[5]  Bernhard Haslhofer,et al.  Putting the CIDOC CRM into Practice - Experiences and Challenges , 2007 .

[6]  Johannes Keizer,et al.  The AGROVOC Linked Dataset , 2013, Semantic Web.

[7]  D. Tudhope,et al.  Barriers and opportunities for Linked Open Data use in archaeology and cultural heritage , 2015 .

[8]  Keith W. Kintigh,et al.  Extracting Information from Archaeological Texts , 2015 .

[9]  Ryan Shaw,et al.  A sharing-oriented design strategy for networked knowledge organization systems , 2016, International Journal on Digital Libraries.

[10]  Elizabeth Yakel,et al.  The challenges of digging data: a study of context in archaeological data reuse , 2013, JCDL '13.

[11]  Antoine Isaac,et al.  Supporting Linked Data Production for Cultural Heritage Institutes: The Amsterdam Museum Case Study , 2012, ESWC.

[12]  David Myers,et al.  The Arches heritage inventory and management system: a platform for the heritage field , 2016 .

[13]  Gail Falkingham A Whiter Shade of Grey: A new approach to archaeological grey literature using the XML version of the TEI Guidelines , 2005 .

[14]  Antoine Isaac,et al.  On the composition of ISO 25964 hierarchical relations (BTG, BTP, BTI) , 2015, International Journal on Digital Libraries.

[15]  Ewan Klein,et al.  Automatic Extraction of Archaeological Events from Text , 2009 .

[16]  Andreas Vlachidis,et al.  Negation detection and word sense disambiguation in digital archaeology reports for the purposes of semantic annotation , 2015, Program.

[17]  Ruth Tringham,et al.  Last House on the Hill: Digitally remediating data and media for preservation and access , 2011, JOCCH.

[18]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[19]  V. de Boer,et al.  Dutch Ships and Sailors Linked Data Cloud , 2014 .

[20]  Sarah Whitcher Kansa,et al.  We All Know That a 14 Is a Sheep: Data Publication and Professionalism in Archaeological Communication , 2013 .

[21]  Douglas Tudhope,et al.  A knowledge‐based approach to Information Extraction for semantic interoperability in the archaeology domain , 2016, J. Assoc. Inf. Sci. Technol..

[22]  Eero Hyvönen,et al.  HealthFinland - A national semantic publishing network and portal for health information , 2009, J. Web Semant..

[23]  Douglas Tudhope,et al.  Connecting archaeological data and grey literature via semantic cross search , 2011 .

[24]  Harith Alani,et al.  Augmenting Thesaurus Relationships: Possibilities for Retrieval , 2001, J. Digit. Inf..

[25]  Maureen Henninger,et al.  From mud to the museum: Metadata challenges in archaeology , 2018, J. Inf. Sci..

[26]  Daniel Cunliffe,et al.  Query expansion via conceptual distance in thesaurus indexed collections , 2006, J. Documentation.

[27]  Douglas Tudhope,et al.  Template Based Semantic Integration: From Legacy Archaeological Datasets to Linked Data , 2015, Int. J. Semantic Web Inf. Syst..

[28]  Carlo Meghini,et al.  Enabling European Archaeological Research: The ARIADNE E-Infrastructure , 2017 .

[29]  Michael R. Olsson,et al.  Making sense of the past: The embodied information practices of field archaeologists , 2016, J. Inf. Sci..

[30]  Ylva Gavel,et al.  Multilingual query expansion in the SveMed+ bibliographic database: A case study , 2014, J. Inf. Sci..

[31]  Crawford Revie,et al.  Thesaurus-enhanced search interfaces , 2002, J. Inf. Sci..

[32]  Peter W. Brewer,et al.  The DCCD: A digital data infrastructure for tree-ring research , 2012 .

[33]  Fabio Ciravegna,et al.  The Archaeology Data Service and the Archaeotools Project:: Faceted Classification and Natural Language Processing , 2011 .

[34]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[35]  S Jeffrey,et al.  The Archaeotools project: faceted classification and natural language processing in an archaeological context , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.