“Just-in-time” generation of datasets by considering structured representations of given consent for GDPR compliance

Data processing is increasingly becoming the subject of various policies and regulations, such as the European General Data Protection Regulation (GDPR) that came into effect in May 2018. One important aspect of GDPR is informed consent, which captures one’s permission for using one’s personal information for specific data processing purposes. Organizations must demonstrate that they comply with these policies. The fines that come with non-compliance are of such importance that it has driven research in facilitating compliance verification. The state-of-the-art primarily focuses on, for instance, the analysis of prescriptive models and posthoc analysis on logs to check whether data processing is compliant to GDPR. We argue that GDPR compliance can be facilitated by ensuring datasets used in processing activities are compliant with consent from the very start. The problem addressed in this paper is how we can generate datasets that comply with given consent “just-in-time”. We propose RDF and OWL ontologies to represent the consent that an organization has collected and its relationship with data processing purposes. We use this ontology to annotate schemas, allowing us to generate declarative mappings that transform (relational) data into RDF driven by the annotations. We furthermore demonstrate how we can create compliant datasets by altering the results of the mapping. The use of RDF and OWL allows us to implement the entire process in a declarative manner using SPARQL. We have integrated all components in a service that furthermore captures provenance information for each step, further contributing to the transparency that is needed towards facilitating compliance verification. We demonstrate the approach with a synthetic dataset simulating users (re-)giving, withdrawing, and rejecting their consent on data processing purposes of systems. In summary, it is argued that the approach facilitates transparency and compliance verification from the start, reducing the need for posthoc compliance analysis common in the state-of-the-art.

[1]  C. Landwehr 2018: a Big Year for Privacy , 2019, Commun. ACM.

[2]  Rik Van de Walle,et al.  RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data , 2014, LDOW.

[3]  Martin Giese,et al.  Engineering ontology-based access to real-world data sources , 2015, J. Web Semant..

[4]  Jeff Z. Pan,et al.  Exploiting Linked Data and Knowledge Graphs in Large Organisations , 2017 .

[5]  Axel Polleres,et al.  A Scalable Consent, Transparency and Compliance Architecture , 2018, ESWC.

[6]  Olaf Hartig,et al.  Towards Interoperable Provenance Publication on the Linked Data Web , 2012, LDOW.

[7]  Raphaël Troncy,et al.  The Semantic Web: ESWC 2014 Satellite Events , 2014, Lecture Notes in Computer Science.

[8]  Christophe Debruyne,et al.  Towards Generating Policy-Compliant Datasets , 2019, 2019 IEEE 13th International Conference on Semantic Computing (ICSC).

[9]  Christophe Debruyne,et al.  R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings , 2016, LDOW@WWW.

[10]  Johan Montagnat,et al.  Translation of Relational and Non-relational Databases into RDF with xR2RML , 2015, WEBIST.

[11]  Declan O'Sullivan,et al.  Queryable Provenance Metadata For GDPR Compliance , 2018, SEMANTiCS.

[12]  Carl Landwher,et al.  2018 , 2019, Communications of the ACM.

[13]  Christophe Debruyne,et al.  GConsent - A Consent Ontology Based on the GDPR , 2019, ESWC.

[14]  Christophe Debruyne,et al.  Generating Executable Mappings from RDF Data Cube Data Structure Definitions , 2018, OTM Conferences.

[15]  Jens Lehmann,et al.  SPIRIT: A Semantic Transparency and Compliance Stack , 2018, SEMANTICS Posters&Demos.

[16]  Andriy Nikolov,et al.  Exploiting Linked Data Cubes with OpenCube Toolkit , 2014, International Semantic Web Conference.

[17]  Axel Polleres,et al.  Creating a Vocabulary for Data Privacy - The First-Year Report of Data Privacy Vocabularies and Controls Community Group (DPVCG) , 2019, OTM Conferences.

[18]  Fabio Vitali,et al.  Modelling OWL Ontologies with Graffoo , 2014, ESWC.

[19]  Declan O'Sullivan,et al.  GDPRtEXT - GDPR as a Linked Data Resource , 2018, ESWC.

[20]  Mark Lizar,et al.  Usable consents: tracking and managing use of personal data with a consent transaction receipt , 2014, UbiComp Adjunct.

[21]  Marco A. Casanova,et al.  Publishing Statistical Data on the Web , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[22]  C. Bizer,et al.  D2R MAP - A Database to RDF Mapping Language , 2003, WWW.

[23]  Christophe Debruyne,et al.  Compliance through Informed Consent: Semantic Based Consent Permission and Data Management Model , 2017, PrivOn@ISWC.

[24]  Stefan Decker,et al.  Mapping between RDF and XML with XSPARQL , 2012, Journal on Data Semantics.

[25]  Pierre-Antoine Champin,et al.  JSON-LD 1.1 – A JSON-based Serialization for Linked Data , 2019 .

[26]  Jens Lehmann,et al.  Distributed Semantic Analytics Using the SANSA Stack , 2017, SEMWEB.