Linked Open Citation Database: Enabling Libraries to Contribute to an Open and Interconnected Citation Graph

Citations play a crucial role in the scientific discourse, in information retrieval, and in bibliometrics. Many initiatives are currently promoting the idea of having free and open citation data. Creation of citation data, however, is not part of the cataloging workflow in libraries nowadays. In this paper, we present our project Linked Open Citation Database, in which we design distributed processes and a system infrastructure based on linked data technology. The goal is to show that efficiently cataloging citations in libraries using a semi-automatic approach is possible. We specifically describe the current state of the workflow and its implementation. We show that we could significantly improve the automatic reference extraction that is crucial for the subsequent data curation. We further give insights on the curation and linking process and provide evaluation results that not only direct the further development of the project, but also allow us to discuss its overall feasibility.

[1]  Silvio Peroni,et al.  Setting our bibliographic references free: towards open citation data , 2015, J. Documentation.

[2]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[3]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  E GARFIELD,et al.  Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.

[5]  Adèle Paul-Hus,et al.  The journal coverage of Web of Science and Scopus: a comparative analysis , 2015, Scientometrics.

[6]  Christian Wilke,et al.  Zitationsdaten extrahieren: halbautomatisch, offen, vernetzt. Ein Workshopbericht , 2017 .

[7]  Silvio Peroni,et al.  The Semantic Publishing and Referencing Ontologies , 2014 .

[8]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[9]  Yvonne Rogers,et al.  Interaction Design: Beyond Human-Computer Interaction. Second Edition , 2007 .

[10]  Annette Klein,et al.  Von der Schneeflocke zur Lawine: Möglichkeiten der Nutzung freier Zitationsdaten in Bibliotheken , 2017 .

[11]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[12]  Yvonne Rogers,et al.  Interaction Design: Beyond Human-Computer Interaction , 2002 .

[13]  Fabio Vitali,et al.  One Year of the OpenCitations Corpus - Releasing RDF-Based Scholarly Citation Data into the Public Domain , 2017, SEMWEB.

[14]  Harry Hochheiser,et al.  Research Methods for Human-Computer Interaction , 2008 .

[15]  Yvonne Rogers,et al.  Interaction Design - Beyond Human-Computer Interaction, 3rd Edition , 2012 .

[16]  Lena Osterhagen,et al.  Managing The Data Base Environment , 2016 .

[17]  Peroni Silvio,et al.  Metadata for the OpenCitations Corpus , 2016 .

[18]  Loet Leydesdorff,et al.  A review of theory and practice in scientometrics , 2015, Eur. J. Oper. Res..

[19]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[20]  Marshall Breeding Future of Library Discovery Systems , 2015 .

[21]  Andreas Dengel,et al.  DeepBIBX: Deep Learning for Image Based Bibliographic Data Extraction , 2017, ICONIP.

[22]  Dietmar Wolfram,et al.  The symbiotic relationship between information retrieval and informetrics , 2014, Scientometrics.

[23]  Thomas M. Breuel,et al.  The OCRopus open source OCR system , 2008, Electronic Imaging.