Secure Search on the Cloud via Coresets and Sketches

\emph{Secure Search} is the problem of retrieving from a database table (or any unsorted array) the records matching specified attributes, as in SQL SELECT queries, but where the database and the query are encrypted. Secure search has been the leading example for practical applications of Fully Homomorphic Encryption (FHE) starting in Gentry's seminal work; however, to the best of our knowledge all state-of-the-art secure search algorithms to date are realized by a polynomial of degree $\Omega(m)$ for $m$ the number of records, which is typically too slow in practice even for moderate size $m$. In this work we present the first algorithm for secure search that is realized by a polynomial of degree polynomial in $\log m$. We implemented our algorithm in an open source library based on HELib implementation for the Brakerski-Gentry-Vaikuntanthan's FHE scheme, and ran experiments on Amazon's EC2 cloud. Our experiments show that we can retrieve the first match in a database of millions of entries in less than an hour using a single machine; the time reduced almost linearly with the number of machines. Our result utilizes a new paradigm of employing coresets and sketches, which are modern data summarization techniques common in computational geometry and machine learning, for efficiency enhancement for homomorphic encryption. As a central tool we design a novel sketch that returns the first positive entry in a (not necessarily sparse) array; this sketch may be of independent interest.

[1]  Berk Sunar,et al.  Bandwidth Efficient PIR from NTRU , 2014, Financial Cryptography Workshops.

[2]  Jack H. Lutz,et al.  Circuit Size Relative to Pseudorandom Oracles , 1993, Theor. Comput. Sci..

[3]  Jung Hee Cheon,et al.  Homomorphic Computation of Edit Distance , 2015, IACR Cryptol. ePrint Arch..

[4]  Artur Czumaj,et al.  (1+ Є)-approximation for facility location in data streams , 2013, SODA.

[5]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[6]  Shai Halevi,et al.  Homomorphic Encryption , 2017, Tutorials on the Foundations of Cryptography.

[7]  Berk Sunar,et al.  Blind Web Search: How far are we from a privacy preserving search engine? , 2016, IACR Cryptol. ePrint Arch..

[8]  Jeff M. Phillips,et al.  Coresets and Sketches , 2016, ArXiv.

[9]  Michael Allen,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .

[10]  Takeshi Koshiba,et al.  New packing method in somewhat homomorphic encryption and its applications , 2015, Secur. Commun. Networks.

[11]  Craig Gentry,et al.  (Leveled) fully homomorphic encryption without bootstrapping , 2012, ITCS '12.

[12]  David P. Woodruff,et al.  On the exact space complexity of sketching and streaming small norms , 2010, SODA '10.

[13]  Michael Naehrig,et al.  Private Computation on Encrypted Genomic Data , 2014, LATINCRYPT.

[14]  Vinod Vaikuntanathan,et al.  Can homomorphic encryption be practical? , 2011, CCSW '11.

[15]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[16]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[17]  David P. Woodru Sketching as a Tool for Numerical Linear Algebra , 2014 .

[18]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[19]  Jun Sakuma,et al.  Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data , 2016, NDSS.

[20]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[21]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[22]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[23]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[24]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[25]  Atri Rudra,et al.  Efficiently decodable non-adaptive group testing , 2010, SODA '10.

[26]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[27]  Vladimir Braverman,et al.  Clustering High Dimensional Dynamic Data Streams , 2017, ICML.

[28]  David P. Woodruff,et al.  Turnstile streaming algorithms might as well be linear sketches , 2014, STOC.

[29]  G. Hardy,et al.  An Introduction to the Theory of Numbers , 1938 .

[30]  Rafail Ostrovsky,et al.  Streaming k-means on well-clusterable data , 2011, SODA '11.