Enhancement of a software chrestomathy for open linked data

A software chrestomathy collects small software systems as an aid in learning a subject. As many languages, technologies and concepts might be involved, it will contain a heterogeneous mix of code artifacts, documentation and relationships. This is problematic, as knowledge resources should convey their data in a structured manner. The data should be conveniently explorable, easily discoverable and as accessible for humans as well as machines. This thesis tries to tackle the problems created by the heterogeneousness of the data by applying Linked Data principles. The 101companies chrestomathy is enriched with these principles, meaning that every important entity is seen as a resource and dereferencable through HTTP, which results in meaningful data about this entity. Additionally, the entities are linked with each other, so that all available data is reachable. It is shown that by embracing the Linked Data principles, the problems created by the diverse data sets can be alleviated. Furthermore, examples are given on how the approach can even enable further research options, such as clone detection, that were previously difficult if not unfeasible. Acknowledgements The 101companies system is a joint work of the Softlang Team at the University of Koblenz-Landau. At the time of writing, it relies on contributions of Kevin Klein, Aleksey Lashin, Ralf Lämmel, Arkadi Schmidt, Thomas Schmorleiz and Andrei Varanovich. This thesis would not have been possible without them.

[1]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[2]  Gary Court,et al.  JSON Schema: core definitions and terminology , 2013 .

[3]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[4]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[5]  Michael Hausenblas,et al.  LD2SD: Linked Data Driven Software Development , 2009, SEKE.

[6]  Iman Keivanloo,et al.  Towards sharing source code facts using linked data , 2011, SUITE '11.

[7]  Ryan Moats,et al.  URN Syntax , 1997, RFC.

[8]  Iman Keivanloo,et al.  A Linked Data platform for mining software repositories , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[9]  Ralf Lämmel,et al.  Linking Documentation and Source Code in a Software Chrestomathy , 2012, 2012 19th Working Conference on Reverse Engineering.

[10]  Ralf Lämmel,et al.  Modeling the linguistic architecture of software products , 2012, MODELS'12.

[11]  Olivier Berger,et al.  Authoritative Linked Data Descriptions of Debian Source Packages Using ADMS.SW , 2013, OSS.

[12]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[14]  Ralf Lämmel,et al.  101companies: A Community Project on Software Technologies and Software Languages , 2012, TOOLS.

[15]  Norman Paskin,et al.  Digital Object Identifiers for scientific data , 2005, Data Sci. J..

[16]  James Howison Cross-repository data linking with RDF and OWL Towards common ontologies for representing FLOSS data , 2008 .

[17]  Michael Hausenblas,et al.  Integrating developer-related information across open source repositories , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[18]  J. Gammack,et al.  A Mashup architecture for web end-user application designs , 2008, 2008 2nd IEEE International Conference on Digital Ecosystems and Technologies.

[19]  Christoph Treude,et al.  Mashup environments in software engineering , 2010, Web2SE '10.

[20]  Elin K. Jacob Ontologies and the Semantic Web , 2005 .