Short Paper: Industrial Feasibility of Private Information Retrieval

A popular security problem in database management is how to guarantee to a querying party that the database owner will not learn anything about the data that is retrieved — a problem known as Private Information Retrieval (PIR). While a variety of PIR schemes are known, they are rarely considered for practical use cases yet. We investigate the feasibility of PIR in the telecommunications world to open up data of carriers to external parties. To this end, we first provide a comparative survey of the current PIR state of the art (including ORAM schemes as a generalized concept) as well as implementation and analysis of two PIR schemes for the considered use case. While an overall conclusion is that PIR techniques are not too far away from practical use in specific cases, we see ORAM as a more suitable candidate for further R&D investment. 1 BACKGROUND AND MOTIVATION The telecommunications world is undergoing a transition where carriers not only provide services such as telephony or internet access, but also attempt to monetize the huge amount of data associated with their subscribers’ activity. Analyzing data like call statistics or roaming behavior can be used to offer specifically tailored services and packages. The combination of such data with other data from 3rd parties can potentially result in even more value. As such, one direction is to open up the existing databases to subscribing external parties. In fact, it may well be the case that two rivaling carriers allow each other to query their subscriber databases, e.g. for detecting fraudulent activities or faults in the network. Another real-world scenario is that of answering to the demands of public authorities wanting to verify that a user has been making a call at a certain time or to assess whether a certain IMEI or IMSI is part of the carriers subscriber base. An open practical problem is how to guarantee to the querying party that the database owner will not learn anything about the data that is retrieved — a problem known as Private Information Retrieval (PIR) (Goldreich and Ostrovsky, 1996). Accordingly, we assessed the feasibility of PIR schemes to support such use cases, where the typical database consists of 400.000-800.000 entries of IMEIs and/or IMSIs. This paper provides a comparative survey of PIR schemes as part of Section 2. We then discuss two schemes in detail, i.e. a Trapdoor Group scheme in Section 3.1 and an ORAM approach in Section 3.2. We provide detailed performance and runtime analysis data in Section 4. 2 OVERVIEW AND COMPARISON OF EXISTING SCHEMES The trivial solution for a user who wants to query a database without the database server learning about the query is to retrieve the entire database from the server and ignore all except the queried entries. Of course, this is very inefficient in terms of communication, but very efficient regarding computational effort because there is (almost) none. Thus, the incurred effort provides a good starting point in that any new solution should have less communication than this trivial solution, often trading this for computational complexity in some form. We split existing works that realize some form of private information retrieval into four main approaches. In a forthcoming paper, we present a detailed overview of the different schemes, here we only categorize the schemes into these high-level approaches. Some of the mentioned schemes have also Jäschke, A., Grohmann, B., Armknecht, F. and Schaad, A. Short Paper: Industrial Feasibility of Private Information Retrieval. DOI: 10.5220/0006382003950400 In Proceedings of the 14th International Joint Conference on e-Business and Telecommunications (ICETE 2017) Volume 4: SECRYPT, pages 395-400 ISBN: 978-989-758-259-2 Copyright © 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 395 been presented in (Ostrovsky and Skeith(III), 2007) and (Olumofin and Goldberg, 2011), and some observations about computational complexity can be found in (Gasarch, 2004) and (Sion and Carbunar, 2007). • Homomorphic Approaches: The user masks (e.g., homomorphically encrypts) the queried index, and the server algebraically combines all indices with the database entries to obtain a masked version of only the queried entry. The user then removes the mask to obtain the result. Publications following this general idea are (Kushilevitz and Ostrovsky, 1997), (Chang, 2004), (Melchor et al., 2016) (Group Homomorphic), (Trostle and Parrish, 2010) (Trapdoor Group), (Kiayias et al., 2015; Lipmaa, 2009; Ishai and Paskin, 2007) (Branching Programs), (Melchor and Gaborit, 2007) (Lattice-based) and (Doröz et al., 2014) (FHE-based). • ORAM Approaches: Comes from the field of software protection, but can also be used to protect privacy in databases. ORAM requires a slightly different setup: The database must be encrypted and thus there must be some key management mechanism. In contrast to pure PIR, ORAM offers the added option of writing, i.e., changing or adding entries. Publications based on ORAM are (Mayberry et al., 2014; Stefanov et al., 2013; Ma et al., 2016) (ORAM-Tree), (Devadas et al., 16 A) (Onion-ORAM), (Apon et al., 2014) (FHEORAM) and (Lorch et al., 2013) (Parallel-TreeORAM). • Garbled Approaches: Since PIR consists of two parties (the user and the server) trying to compute a function (the correct database entry) without the server learning the users input (the query index), it is natural to look to Multiparty-Computation, where two or more parties compute a function together without learning any input except their own, and the result of the computation. Publications involving this approach are (Lu and Ostrovsky, 2013; Gentry et al., 2014a; Gentry et al., 2014b). • Other Approaches: The φ-Hiding Approach (Cachin et al., 1999), the Trapdoor Permutation Approach (Kushilevitz and Ostrovsky, 2000), and the Sender Anonymity Approach (Trostle and Parrish, 2010). Table 1 compares the schemes from the above approaches, indicating a particularly good value with a (light) green background and particularly unfavorable aspects with a (darker) red background. The aspects considered are CommU (communication from user to the server), CommS (communication from server to the user), CompU (user computation effort) and CompS (server computation effort). The variables used are: • n is the number of database elements • B is the block size • λ is the security parameter • M (resp. C) is the message (resp. ciphertext) space of the encryption scheme • m is a finite group order The leftmost column denotes the approach as presented above: H for homomorphic, O for ORAM, G for garbled and “-” if none apply. 3 CHOOSING AND OPTIMIZING To test performance for the use cases described in Section 1, we implemented two approaches — one homomorphic and one ORAM-approach, as these differ greatly, yet can both solve our problem of PIR. Concretely, we chose and modified a Trapdoor Group Scheme based on (Trostle and Parrish, 2010) and the Path-ORAM Scheme (Stefanov et al., 2013) for their conceptual simplicity. 3.1 The (Optimized) Trapdoor Group Scheme The original scheme (Trostle and Parrish, 2010) only allows retrieval of an entire row (i.e., √ n out of n) of database entries, which we extend to allow singleentry-retrieval and minimize communication. We present this optimized scheme as a protocol: Database Structure: n elements of ZN arranged as a ln(n)-dimensional array with entries xi1,...,iln(n) , i j = 1, . . . ,n1/ ln(n) for j = 1, . . . , ln(n). Prerequisites: We assume that we work in the group (Zm,+) and that m and N are coprime. Queries: Suppose the user wants to query the element xi1,...,iln(n) . 1. The user selects m as the group order above depending on the required security level, but at least m > Ndln(n)e ·n · (N−1). 2. The user randomly selects secret b j ∈ Zm, j = 1, . . . , ln(n) and ln(n) ·n1/ ln(n) coefficients ei, j, i = 1, . . . ,n1/ ln(n), j = 1, . . . , ln(n) with three restrictions: i. ei, j < ln(n) √ m n·(N−1) for all (i, j). ii. For j = 1, . . . , ln(n): If ij 6= i,ei, j is a multiple of N (i.e., ei = ai ·N for some ai). SECRYPT 2017 14th International Conference on Security and Cryptography

[1]  Andy Parrish,et al.  Efficient Computationally Private Information Retrieval from Anonymity or Trapdoor Groups , 2010, ISC.

[2]  Anat Paskin-Cherniavsky,et al.  Evaluating Branching Programs on Encrypted Data , 2007, TCC.

[3]  Rafail Ostrovsky,et al.  One-Way Trapdoor Permutations Are Sufficient for Non-trivial Single-Server Private Information Retrieval , 2000, EUROCRYPT.

[4]  Silvio Micali,et al.  Computationally Private Information Retrieval with Polylogarithmic Communication , 1999, EUROCRYPT.

[5]  Ling Ren,et al.  Path ORAM , 2012, J. ACM.

[6]  Jinsheng Zhang,et al.  SE-ORAM: A Storage-Efficient Oblivious RAM for Privacy-Preserving Access to Cloud Storage , 2016, 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud).

[7]  Marc-Olivier Killijian,et al.  XPIR : Private Information Retrieval for Everyone , 2016, Proc. Priv. Enhancing Technol..

[8]  Joshua Schiffman,et al.  Shroud: ensuring private access to large-scale data in the data center , 2013, FAST.

[9]  William I. Gasarch,et al.  A Survey on Private Information Retrieval (Column: Computational Complexity) , 2004, Bull. EATCS.

[10]  Elaine Shi,et al.  Verifiable Oblivious Storage , 2014, Public Key Cryptography.

[11]  Rafail Ostrovsky,et al.  A Survey of Single Database PIR: Techniques and Applications , 2007, IACR Cryptol. ePrint Arch..

[12]  Philippe Gaborit,et al.  A Lattice-Based Computationally-Efficient Private Information Retrieval Protocol , 2007, IACR Cryptol. ePrint Arch..

[13]  Travis Mayberry,et al.  Efficient Private File Retrieval by Combining ORAM and PIR , 2014, NDSS.

[14]  Berk Sunar,et al.  Bandwidth Efficient PIR from NTRU , 2014, Financial Cryptography Workshops.

[15]  Rafail Ostrovsky,et al.  Replication is not needed: single database, computationally-private information retrieval , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[16]  Aggelos Kiayias,et al.  Optimal Rate Private Information Retrieval from Homomorphic Encryption , 2015, Proc. Priv. Enhancing Technol..

[17]  Craig Gentry,et al.  Outsourcing Private RAM Computation , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[18]  Rafail Ostrovsky,et al.  How to Garble RAM Programs , 2013, EUROCRYPT.

[19]  Rafail Ostrovsky,et al.  Software protection and simulation on oblivious RAMs , 1996, JACM.

[20]  Rafail Ostrovsky,et al.  Garbled RAM Revisited , 2014, EUROCRYPT.

[21]  Ian Goldberg,et al.  Revisiting the Computational Practicality of Private Information Retrieval , 2011, Financial Cryptography.

[22]  Yan-Cheng Chang,et al.  Single Database Private Information Retrieval with Logarithmic Communication , 2004, ACISP.

[23]  Elaine Shi,et al.  Onion ORAM: A Constant Bandwidth Blowup Oblivious RAM , 2016, TCC.

[24]  Radu Sion,et al.  On the Practicality of Private Information Retrieval , 2007, NDSS.

[25]  Helger Lipmaa,et al.  First CPIR Protocol with Data-Dependent Computation , 2009, ICISC.