The Effects of Licensing on Open Data: Computing a Measure of Health for Our Scholarly Record

As data collections become established in key disciplines, some of the longstanding barriers to data sharing become to dissolve; yet others remain. While metadata and ontologies help overcome the problems of finding and interpreting data, the lack of clarity over licensing remains a real impediment to data reuse. Freedom from legal restriction and uncertainty is essential for the effective sharing, combining and deriving of data from these distributed collections. Reuse and recombination of data will be greatly facilitated by expanding the definition of the semantic web to include the semantics of data licensing. We aim to express licensing terms in a computable manner, within the context of research practice, enabling us to infer the resulting state of rights, obligations and conditions that are inherited by derived and recombined datasets, using a mixed bag of licenses. Building off this we aim to simulate the effects of varying licensing practices within communities, proposing a measure of health of our scholarly record based on compatibility and restrictiveness of the licenses contained therein.