Linked Data Application Development Methodology

The vast amount of data available over the distributed infrastructure of the Web has initiated the development of techniques for their representation, storage and usage. One of these techniques is the Linked Data paradigm, which aims to provide unified practices for publishing and contextually interlinking data on the Web, by using the World Wide Web Consortium (W3C) standards and the Semantic Web technologies. This approach enables the transformation of the Web from a web of documents, to a web of data. With it, the Web transforms into a distributed network of data which can be used by software agents and machines. The interlinked nature of the distributed datasets enables the creation of advanced use-case scenarios for the end users and their applications, scenarios previously unavailable over isolated data silos. This creates opportunities for generating new business values in the industry. The adoption of the Linked Data principles by data publishers from the research community and the industry has led to the creation of the Linked Open Data (LOD) Cloud, a vast collection of interlinked data published on and accessible via the existing infrastructure of the Web. The experience in creating these Linked Data datasets has led to the development of a few methodologies for transforming and publishing Linked Data. However, even though these methodologies cover the process of modeling, transforming / generating and publishing Linked Data, they do not consider reuse of the steps from the life-cycle. This results in separate and independent efforts to generate Linked Data within a given domain, which always go through the entire set of life-cycle steps. In this PhD thesis, based on our experience with generating Linked Data in various domains and based on the existing Linked Data methodologies, we define a new Linked Data methodology with a focus on reuse. It consists of five steps which encompass the tasks of studying the domain, modeling the data, transforming the data, publishing it and exploiting it. In each of the steps, the methodology provides guidance to data publishers on defining reusable components in the form of tools, schemas and services, for the given domain. With this, future Linked Data publishers in the domain would be able to reuse these components to go through the life-cycle steps in a more efficient and productive manner. With the reuse of schemas from the domain, the resulting Linked Data dataset will be compatible and aligned with other datasets generated by reusing the same components, which additionally leverages the value of the datasets. This approach aims to encourage data publishers to generate high-quality, aligned Linked Data datasets from various domains, leading to further growth of the number of datasets on the LOD Cloud, their quality and the exploitation scenarios. With the emergence of data-driven scientific fields, such as Data Science, creating and publishing high-quality Linked Data datasets on the Web is becoming even more important, as it provides an open dataspace built on existing Web standards. Such a dataspace enables data scientists to make data analytics over the cleaned, structured and aligned data in it, in order to produce new knowledge and introduce new value in a given domain. As the Linked Data principles are also applicable within closed environments over proprietary data, the same methods and approaches are applicable in the enterprise domain as well.

[1]  Ljupco Kocarev,et al.  Inferring Cuisine - Drug Interactions Using the Linked Data Approach , 2015, Scientific Reports.

[2]  Heiko Paulheim,et al.  Adoption of the Linked Data Best Practices in Different Topical Domains , 2014, SEMWEB.

[3]  David Wood,et al.  The Joy of Data - A Cookbook for Publishing Linked Government Data on the Web , 2011 .

[4]  Bojan Najdenov,et al.  Linked Open Data for Medical Institutions and Drug Availability Lists in Macedonia , 2014, ADBIS.

[5]  François Scharffe,et al.  Ontology alignment design patterns , 2013, Knowledge and Information Systems.

[6]  John E. Gaffney,et al.  A general economics model of software reuse , 1992, International Conference on Software Engineering.

[7]  Jens Lehmann,et al.  Managing the Life-Cycle of Linked Data with the LOD2 Stack , 2012, SEMWEB.

[8]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[9]  Will Tracz Where does reuse start? , 1990, SOEN.

[10]  Bojan Najdenov,et al.  LINKED OPEN DRUG DATA FROM THE HEALTH INSURANCE FUND OF MACEDONIA , 2013 .

[11]  Bojan Najdenov,et al.  Linked music data from global music charts , 2014, SEM '14.

[12]  José Luis Redondo García,et al.  Linked Data methodologies for managing information about television content: Applying Linked Data principles in the OntoTV system, in order to improve the collection processes and the way television information is accessed , 2012, 7th Iberian Conference on Information Systems and Technologies (CISTI 2012).

[13]  Milos Jovanovik,et al.  A System for Suggestion and Execution of Semantically Annotated Actions Based on Service Composition , 2013, ICT Innovations.

[14]  Ted J. Biggerstaff,et al.  The library scaling problem and the limits of concrete component reuse , 1994, Proceedings of 1994 3rd International Conference on Software Reuse.

[15]  Milos Jovanovik,et al.  TOWARDS OPEN DATA IN MACEDONIA: CRIME MAP BASED ON MINISTRY OF INTERNAL AFFAIRS ’ BULLETINS , 2012 .

[16]  Frank van Harmelen,et al.  A semantic web primer , 2004 .

[17]  Milos Jovanovik,et al.  An RDF Dataset Generator for the Social Network Benchmark with Real-World Coherence , 2016, BLINK@ISWC.

[18]  Heiko Paulheim,et al.  A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time , 2015, WIMS.

[19]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[20]  Sonja Filiposka,et al.  Semantic sky: a platform for cloud service integration based on semantic web technologies , 2012, I-SEMANTICS '12.

[21]  Tim Berners-Lee,et al.  Linked Data on the Web , 2008, LDOW.

[22]  Bojan Najdenov,et al.  VEO: an Ontology for CO2 Emissions from Vehicles , 2014 .

[23]  Carlo Meghini,et al.  Towards a Methodology for Publishing Library Linked Data , 2013, IRCDL.

[24]  William Smith,et al.  Medical and transmission vector vocabulary alignment with Schema.org , 2015, ICBO.

[25]  Milos Jovanovik,et al.  Desktop Gateway: Semantic Desktop Integration with Cloud Services , 2013, BCI.

[26]  Dan Brickley,et al.  Schema.org: Evolution of Structured Data on the Web , 2015, ACM Queue.

[27]  Eduard KLEIN,et al.  Sustainable Linked Open Data Creation: An Experience Report , 2016 .

[28]  Anisa Rula,et al.  Methodology for Assessment of Linked Data Quality , 2014, LDQ@SEMANTICS.

[29]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[30]  Nigel Shadbolt,et al.  There's gold to be mined from all our data , 2011 .

[31]  Oscar Corcho,et al.  Methodological Guidelines for Publishing Government Linked Data , 2011 .

[32]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[33]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[34]  Bojan Najdenov,et al.  Open Financial Data from the Macedonian Stock Exchange , 2014, ICT Innovations.

[35]  Bojan Najdenov,et al.  Automated linked data generation from the transport administration domain , 2015, 2015 23rd Telecommunications Forum Telfor (TELFOR).

[36]  Dietrich Rebholz-Schuhmann,et al.  Thematic series on biomedical ontologies in JBMS: challenges and new directions , 2014, J. Biomed. Semant..

[37]  Ivica Crnkovic,et al.  Towards specifying pragmatic software reuse , 2015, ECSA Workshops.

[38]  Milos Jovanovik,et al.  Live Objects - Collaborative Window in the Corporate Documents , 2014, ADBIS.