Weaving Scholarly Legacy Data into Web of Data

The Linked Open Data project provides a new publishing paradigm for creating machine readable and structured data on the Web. Currently, the significant presence of data sets describing scholarly publications in the Linked Data cloud underpins the importance of Linked Data for the scientific community and for the open access movement. However, these semantically rich datasets need to be exploited and linked with real time applications. In the project we report on this. We have exploited numerous scholarly datasets and have created semantic links to papers in an online journal, particularly Journal of Universal Computer Science (J.UCS). The J. UCS plays an important part in the computer science publishing community and provides a number of innovative features and datasets to its web users. However, the legacy HTML format in which these features are made available makes it difficult for machines to understand and query. Keeping in mind the impressive benefits of the Linked Open Data project, this paper presents an approach to convert J.UCS legacy HTML data from its current form to machine understandable format (RDF). It also interlinks this data with other important Linked Data resources. The approach developed has successfully disambiguated and interlinked J.UCS authors and publications datasets with DBpedia, DBLP, CiteULike and faceted DBLP. Additionally, triplified and interlinked datasets are made available to the scientific and semantic web community for downloading and posing SPARQL queries. This semantically linked dataset can further be used by researchers and semantic agents to identify semantic associations, to build inferencing systems, and to extract useful knowledge.

[1]  Harald Krottmaier,et al.  Links to the Future , 2003, J. Digit. Inf. Manag..

[2]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[3]  Cristian S. Calude,et al.  Journal of Universal Computer Science , 1994, J. Univers. Comput. Sci..

[4]  Gary Marchionini,et al.  The roles of digital libraries in teaching and learning , 1995, CACM.

[5]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[6]  Wolf-Tilo Balke,et al.  Rule based Autonomous Citation Mining with TIERL , 2010, J. Digit. Inf. Manag..

[7]  Hermann A. Maurer,et al.  Discovery and Construction of Authors' Profile from Linked Data (A case study for Open Digital Journal) , 2010, LDOW.

[8]  Eyal Oren,et al.  Sindice.com: a document-oriented lookup index for open linked data , 2008, Int. J. Metadata Semant. Ontologies.

[9]  Hermann A. Maurer,et al.  Expertise Finding for an Electronic Journal , 2008 .

[10]  Muhammad Tanvir Afzal Information Supply of Related Papers from the Web for Scholarly e-Community , 2009, WEBIST.

[11]  Martin Hepp,et al.  Harvesting Wiki Consensus: Using Wikipedia Entries as Vocabulary for Knowledge Management , 2007, IEEE Internet Computing.

[12]  Enrico Motta,et al.  SparqPlug: Generating Linked Data from Legacy HTML, SPARQL and the DOM , 2008, LDOW.

[13]  Yuzhong Qu,et al.  Falcons: searching and browsing entities on the semantic web , 2008, WWW.

[14]  Jens Lehmann,et al.  Triplify: light-weight linked data publication from relational databases , 2009, WWW '09.

[15]  Nicholas A. Cumpsty Some Lessons Learned , 2010 .

[16]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[17]  Dan Brickley,et al.  FOAF Vocabulary Specification , 2004 .

[18]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[19]  Hermann A. Maurer,et al.  Discovery and visualization of expertise in a scientific community , 2009, FIT.

[20]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[21]  Klaus Tochtermann,et al.  Harvesting Pertinent Resources from Linked Open Data , 2010, J. Digit. Inf. Manag..

[22]  M B Eisen,et al.  Building A "GenBank" of the Published Literature , 2001, Science.

[23]  Andreas Harth,et al.  Towards Semantically-Interlinked Online Communities , 2005, ESWC.

[24]  Asunción Gómez-Pérez Linked data applications , 2012 .

[25]  Christian Bizer,et al.  D2R Server - Publishing Relational Databases on the Semantic Web , 2004 .

[26]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.