A linked open data framework to enhance the discoverability and impact of culture heritage

Cultural heritage institutions have recently begun to consider the benefits of sharing their collections using linked open data to disseminate and enrich their metadata. As datasets become very large, challenges appear, such as ingestion, management, querying and enrichment. Furthermore, each institution has particular features related to important aspects such as vocabularies and interoperability, which make it difficult to generalise this process and provide one-for-all solutions. In order to improve the user experience as regards information retrieval systems, researchers have identified that further refinements are required for the recognition and extraction of implicit relationships expressed in natural language. We introduce a framework for the enrichment and disambiguation of locations in text using open knowledge bases such as Wikidata and GeoNames. The framework has been successfully used to publish a dataset based on information from the Biblioteca Virtual Miguel de Cervantes, thus illustrating how semantic enrichment can help information retrieval. The methods applied in order to automate the enrichment process, which build upon open source software components, are described herein.

[1]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[2]  Maja Zumer,et al.  Looking for Entities in Bibliographic Records , 2008, ICADL.

[3]  Elise Y. Wong RDA Registry , 2015 .

[4]  Diane Rasmussen Pennington,et al.  RDA in Europe : Implementations and perceptions , 2019 .

[5]  Ig Ibert Bittencourt,et al.  A systematic review on the use of best practices for publishing linked data , 2018, Online Inf. Rev..

[6]  Clemens Neudecker,et al.  An Open Corpus for Named Entity Recognition in Historic Newspapers , 2016, LREC.

[7]  Erdogan Dogdu,et al.  Named entity recognition and disambiguation using linked data and graph-based centrality scoring , 2012, SWIM '12.

[8]  Theo van Veen,et al.  Linking Named Entities in Dutch Historical Newspapers , 2016, MTSR.

[9]  Pushpak Bhattacharyya,et al.  A Framework that Uses the Web for Named Entity Class Identification: Case Study for Indian Classical Music Forums , 2016, Computación y Sistemas.

[10]  Serena Villata,et al.  Enriching a Small Artwork Collection Through Semantic Linking , 2016, ESWC.

[11]  Theo van Veen,et al.  Semantic Enrichment: a Low-barrier Infrastructure and Proposal for Alignment , 2015, D Lib Mag..

[12]  Rafael C. Carrasco,et al.  Migration of a library catalogue into RDA linked open data , 2017, Semantic Web.

[13]  Benjamin M. Good,et al.  WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata , 2017, bioRxiv.

[14]  Patricia Murrieta-Flores,et al.  Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora , 2018, Front. Digit. Humanit..

[15]  Patrick Le Bœuf,et al.  Modeling Rare and Unique Documents: Using FRBROO/CIDOC CRM , 2012 .

[16]  Noreen Whysel,et al.  Linked open data for cultural heritage: evolution of an information technology , 2013, SIGDOC '13.

[17]  Diane Rasmussen Pennington,et al.  Resource Description and Access in Europe: Implementations and perceptions , 2019, J. Libr. Inf. Sci..

[18]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[19]  J. Stephen Downie,et al.  Improving Access to Large-scale Digital Libraries ThroughSemantic-enhanced Search and Disambiguation , 2015, JCDL.

[20]  Yuji Tosaka,et al.  RDA: Resource description & access - a survey of the current state of the art , 2013, J. Assoc. Inf. Sci. Technol..

[21]  Kalina Bontcheva,et al.  Semantic Enrichment and Search: A Case Study on Environmental Science Literature , 2015, D Lib Mag..

[22]  Josiane Mothe,et al.  Location extraction from tweets , 2018, Inf. Process. Manag..

[23]  Manuel Marco Such,et al.  Semantic Enrichment on Cultural Heritage collections: A case study using geographic information , 2017, DATeCH.

[24]  Srividya Kona Bansal,et al.  Towards a Semantic Extract-Transform-Load (ETL) Framework for Big Data Integration , 2014, 2014 IEEE International Congress on Big Data.

[25]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[26]  José Luis Borbinha,et al.  An analysis of the named entity recognition problem in digital library metadata , 2012, JCDL '12.

[27]  Mohammad Al-Smadi,et al.  Arabic named entity disambiguation using linked open data , 2016, 2016 7th International Conference on Information and Communication Systems (ICICS).

[28]  Ana M. García-Serrano,et al.  Using Linked Open Data Sources for Entity Disambiguation , 2013, CLEF.

[29]  Ross Purves,et al.  A quantitative analysis of global gazetteers: Patterns of coverage for common feature types , 2017, Comput. Environ. Urban Syst..

[30]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.