SMCQL: Secure Query Processing for Private Data Networks

People and machines are collecting data at an unprecedented rate. Despite this newfound abundance of data, progress has been slow in sharing it for open science, business, and other data-intensive endeavors. Many such efforts are stymied by privacy concerns and regulatory compliance issues. For example, many hospitals are interested in pooling their medical records for research, but none may disclose arbitrary patient records to researchers or other healthcare providers. In this context we propose the Private Data Network (PDN), a federated database for querying over the collective data of mutually distrustful parties. In a PDN, each member database does not reveal its tuples to its peers nor to the query writer. Instead, the user submits a query to an honest broker that plans and coordinates its execution over multiple private databases using secure multiparty computation (SMC). Here, each database's query execution is oblivious, and its program counters and memory traces are agnostic to the inputs of others. We introduce a framework for executing PDN queries named SMCQL. This system translates SQL statements into SMC primitives to compute query results over the union of its source databases without revealing sensitive information about individual tuples to peer data providers or the honest broker. Only the honest broker and the querier receive the results of a PDN query. For fast, secure query evaluation, we explore a heuristics-driven optimizer that minimizes the PDN's use of secure computation and partitions its query evaluation into scalable slices.

[1]  Lior Malka,et al.  VMCrypt: modular software architecture for scalable secure computation , 2011, CCS '11.

[2]  David Levine,et al.  CAPriCORN: Chicago Area Patient-Centered Outcomes Research Network , 2014, J. Am. Medical Informatics Assoc..

[3]  Fred B. Schneider,et al.  A Language-Based Approach to Security , 2001, Informatics.

[4]  Russell L. Rothman,et al.  The ADAPTABLE Trial and PCORnet: Shining Light on a New Research Paradigm , 2015, Annals of Internal Medicine.

[5]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[6]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[7]  Dongwon Lee,et al.  Blocking-aware private record linkage , 2005, IQIS '05.

[8]  Xiaodong Lin,et al.  Secure Regression on Distributed Databases , 2005 .

[9]  Bradley Malin,et al.  Design and implementation of a privacy preserving electronic health record linkage tool in Chicago , 2015, J. Am. Medical Informatics Assoc..

[10]  Geoffrey Smith,et al.  A Sound Type System for Secure Flow Analysis , 1996, J. Comput. Secur..

[11]  Oded Goldreich,et al.  Towards a theory of software protection and simulation by oblivious RAMs , 1987, STOC.

[12]  Lakshminarayanan Subramanian,et al.  Two-Party Computation Model for Privacy-Preserving Queries over Distributed Databases , 2009, NDSS.

[13]  Riivo Talviste,et al.  From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting , 2013, ACNS.

[14]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[15]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[16]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[17]  Michael Hicks,et al.  Wysteria: A Programming Language for Generic, Mixed-Mode Multiparty Computations , 2014, 2014 IEEE Symposium on Security and Privacy.

[18]  Rajeev Motwani,et al.  Two Can Keep A Secret: A Distributed Architecture for Secure Database Services , 2005, CIDR.

[19]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[20]  Florian Kerschbaum,et al.  Automatically optimizing secure computation , 2011, CCS '11.

[21]  Michael Hicks,et al.  Knowledge inference for optimizing secure multi-party computation , 2013, PLAS '13.

[22]  Benny Pinkas,et al.  Secure Computation of the k th-Ranked Element , 2004, EUROCRYPT.

[23]  Dan Bogdanov,et al.  Privacy-preserving tax fraud detection in the cloud with realistic data volumes Version 1 . 1 , 2016 .

[24]  Ahmad-Reza Sadeghi,et al.  TASTY: tool for automating secure two-party computations , 2010, CCS '10.

[25]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[26]  Khaled El Emam,et al.  Heuristics for De-identifying Health Data , 2008, IEEE Secur. Priv..

[27]  Florian Kerschbaum,et al.  An information-flow type-system for mixed protocol secure computation , 2013, ASIA CCS '13.

[28]  Kartik Nayak,et al.  ObliVM: A Programming Framework for Secure Computation , 2015, 2015 IEEE Symposium on Security and Privacy.

[29]  Dan Suciu,et al.  Automatic Enforcement of Data Use Policies with DataLawyer , 2015, SIGMOD Conference.

[30]  Rajeev Motwani,et al.  Auditing SQL Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[31]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[32]  Nora Cuppens-Boulahia,et al.  Privacy Preserving Record Matching Using Automated Semi-trusted Broker , 2015, DBSec.

[33]  Hari Balakrishnan,et al.  CryptDB: protecting confidentiality with encrypted query processing , 2011, SOSP.

[34]  Andrew C. Myers,et al.  Language-based information-flow security , 2003, IEEE J. Sel. Areas Commun..

[35]  David Evans,et al.  Obliv-C: A Language for Extensible Data-Oblivious Computation , 2015, IACR Cryptol. ePrint Arch..

[36]  Rafail Ostrovsky,et al.  Garbled RAM Revisited , 2014, EUROCRYPT.

[37]  David Chaum,et al.  Multiparty unconditionally secure protocols , 1988, STOC '88.

[38]  Jan Willemson,et al.  Round-Efficient Oblivious Database Manipulation , 2011, ISC.

[39]  Charles V. Wright,et al.  Inference Attacks on Property-Preserving Encrypted Databases , 2015, CCS.