Automatic Enforcement of Data Use Policies with DataLawyer

Data has value and is increasingly being exchanged for commercial and research purposes. Data, however, is typically accompanied by terms of use, which limit how it can be used. To date, there are only a few, ad-hoc methods to enforce these terms. We propose DataLawyer, a new system to formally specify usage policies and check them automatically at query runtime in a relational database management system (DBMS). We develop a new model to specify policies compactly and precisely. We introduce novel algorithms to efficiently evaluate policies that can cut policy-checking overheads to only a few percent of the total query runtime. We implement DataLawyer and evaluate it on a real database from the health-care domain.

[1]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[2]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[3]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[4]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[5]  Surajit Chaudhuri,et al.  Materialized view and index selection tool for Microsoft SQL server 2000 , 2001, SIGMOD '01.

[6]  Ramakrishnan Srikant,et al.  Hippocratic Databases , 2002, VLDB.

[7]  Jaehong Park,et al.  The UCONABC usage control model , 2004, TSEC.

[8]  Opher Etzion,et al.  Complex event processing , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[9]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[10]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[11]  Johannes Gehrke,et al.  Cayuga: A General Purpose Event Monitoring System , 2007, CIDR.

[12]  Rajeev Motwani,et al.  Auditing SQL Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  James Cheney,et al.  Curated databases , 2008, PODS.

[14]  Gustavo Alonso,et al.  Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[15]  Alejandro P. Buchmann,et al.  Complex Event Processing , 2009, it Inf. Technol..

[16]  Elena Ferrari,et al.  Access Control in Data Management Systems , 2010, Access Control in Data Management Systems.

[17]  Val Tannen,et al.  Provenance for database transformations , 2008, EDBT '10.

[18]  Daniel Fabbri,et al.  PolicyReplay: Misconfiguration-Response Queries for Data Breach Reporting , 2010, Proc. VLDB Endow..

[19]  Raghav Kaushik,et al.  Efficient auditing for complex SQL queries , 2011, SIGMOD '11.

[20]  Marianne Winslett,et al.  Efficient audit-based compliance for relational data retention , 2011, ASIACCS '11.

[21]  Milos Nikolic,et al.  DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views , 2012, Proc. VLDB Endow..

[22]  Gustavo Alonso,et al.  SharedDB: Killing One Thousand Queries With One Stone , 2012, Proc. VLDB Endow..

[23]  Amir Shaikhha,et al.  DBToaster: higher-order delta processing for dynamic, frequently fresh views , 2012, The VLDB Journal.

[24]  Florian Stahl,et al.  Marketplaces for data: an initial survey , 2013, SGMD.

[25]  C. Leva,et al.  18. World Bank , 2013 .

[26]  Dan Suciu,et al.  The power of data use management in action , 2013, SIGMOD '13.

[27]  Wenfei Fan,et al.  On scale independence for querying big data , 2014, PODS.