Pretzel: Email encryption and provider-supplied functions are compatible

Emails today are often encrypted, but only between mail servers---the vast majority of emails are exposed in plaintext to the mail servers that handle them. While better than no encryption, this arrangement leaves open the possibility of attacks, privacy violations, and other disclosures. Publicly, email providers have stated that default end-to-end encryption would conflict with essential functions (spam filtering, etc.), because the latter requires analyzing email text. The goal of this paper is to demonstrate that there is no conflict. We do so by designing, implementing, and evaluating Pretzel. Starting from a cryptographic protocol that enables two parties to jointly perform a classification task without revealing their inputs to each other, Pretzel refines and adapts this protocol to the email context. Our experimental evaluation of a prototype demonstrates that email can be encrypted end-to-end and providers can compute over it, at tolerable cost: clients must devote some storage and processing, and provider overhead is roughly 5x versus the status quo.

[1]  Wenliang Du,et al.  Privacy-preserving cooperative scientific computations , 2001, Proceedings. 14th IEEE Computer Security Foundations Workshop, 2001..

[2]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[3]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.2 , 2008, RFC.

[4]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[5]  Zuocheng Ren,et al.  Efficient RAM and control flow in verifiable outsourced computation , 2015, NDSS.

[6]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[7]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[8]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[9]  Chris Peikert,et al.  A Toolkit for Ring-LWE Cryptography , 2013, IACR Cryptol. ePrint Arch..

[10]  Tal Malkin,et al.  Garbling Gadgets for Boolean and Arithmetic Circuits , 2016, IACR Cryptol. ePrint Arch..

[11]  Ralph C. Merkle,et al.  Secure communications over insecure channels , 1978, CACM.

[12]  Yehuda Lindell,et al.  A Proof of Security of Yao’s Protocol for Two-Party Computation , 2009, Journal of Cryptology.

[13]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[14]  Bo Tang,et al.  Toward Optimal Feature Selection in Naive Bayes for Text Categorization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[15]  Marina Blanton,et al.  Secure and Efficient Protocols for Iris and Fingerprint Identification , 2011, ESORICS.

[16]  Artak Amirbekyan,et al.  A New Efficient Privacy-Preserving Scalar Product Protocol , 2007, AusDM.

[17]  Sheng Zhong,et al.  Privacy-Preserving Classification of Customer Data without Loss of Accuracy , 2005, SDM.

[18]  Ahmad-Reza Sadeghi,et al.  TinyGarble: Highly Compressed and Scalable Sequential Garbled Circuits , 2015, 2015 IEEE Symposium on Security and Privacy.

[19]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[20]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[21]  Jonathan Katz,et al.  Revisiting Square-Root ORAM: Efficient Random Access in Multi-party Computation , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[22]  and Hareesha,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data Using Secure Division , 2014 .

[23]  Samuel Madden,et al.  Processing Analytical Queries over Encrypted Data , 2013, Proc. VLDB Endow..

[24]  Stan Matwin,et al.  Privacy-Preserving Naive Bayesian Classification , 2004 .

[25]  Ravindra Patel,et al.  Privacy Preserving Three-Layer Naïve Bayes Classifier for Vertically Partitioned Databases , 2013 .

[26]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[27]  Ahmad-Reza Sadeghi,et al.  Efficient Privacy-Preserving Face Recognition , 2009, ICISC.

[28]  Stan Matwin,et al.  Privacy-Preserving Naive Bayesian Classification over Horizontally Partitioned Data , 2008, Data Mining: Foundations and Practice.

[29]  Yi-Ting Chiang,et al.  Secrecy of Two-Party Secure Computation , 2005, DBSec.

[30]  Joseph Bonneau,et al.  EthIKS: Using Ethereum to Audit a CONIKS Key Transparency Log , 2016, Financial Cryptography Workshops.

[31]  Andrew Slater,et al.  The Learning Behind Gmail Priority Inbox , 2010 .

[32]  A. Sadeghi,et al.  Efficient Privacy-Preserving Face Recognition ( Full Version ) ? , 2009 .

[33]  Eli Ben-Sasson,et al.  Secure Sampling of Public Parameters for Succinct Zero Knowledge Proofs , 2015, 2015 IEEE Symposium on Security and Privacy.

[34]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[35]  Vinod Vaikuntanathan,et al.  SHIELD: Scalable Homomorphic Implementation of Encrypted Data-Classifiers , 2015, IEEE Transactions on Computers.

[36]  Moni Naor,et al.  Oblivious transfer and polynomial evaluation , 1999, STOC '99.

[37]  Joshua Goodman,et al.  Online Discriminative Spam Filter Training , 2006, CEAS.

[38]  Benjamin Braun,et al.  Verifying computations with state , 2013, IACR Cryptol. ePrint Arch..

[39]  Jonathan Katz,et al.  Quid-Pro-Quo-tocols: Strengthening Semi-honest Protocols with Dual Execution , 2012, 2012 IEEE Symposium on Security and Privacy.

[40]  Michael J. Freedman,et al.  CONIKS: Bringing Key Transparency to End Users , 2015, USENIX Security Symposium.

[41]  L. Fortnow,et al.  Recent Developments in Explicit Constructions of Extractors , 2002, Bull. EATCS.

[42]  Sven Laur,et al.  On Private Similarity Search Protocols , 2004 .

[43]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[44]  Yanchun Zhang,et al.  Privacy-preserving naive Bayes classification on distributed data via semi-trusted mixers , 2009, Inf. Syst..

[45]  Gilles Brassard,et al.  All-or-Nothing Disclosure of Secrets , 1986, CRYPTO.

[46]  Hugo Krawczyk,et al.  Dynamic Searchable Encryption in Very-Large Databases: Data Structures and Implementation , 2014, NDSS.

[47]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[48]  J. Alex Halderman,et al.  Neither Snow Nor Rain Nor MITM...: An Empirical Analysis of Email Delivery Security , 2015, Internet Measurement Conference.

[49]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[50]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[51]  Mikhail J. Atallah,et al.  A secure protocol for computing dot-products in clustered and distributed environments , 2002, Proceedings International Conference on Parallel Processing.

[52]  Craig Gentry,et al.  Homomorphic Evaluation of the AES Circuit , 2012, IACR Cryptol. ePrint Arch..

[53]  Chris Clifton,et al.  Privacy-preserving Naïve Bayes classification , 2008, The VLDB Journal.

[54]  Eli Ben-Sasson,et al.  Succinct Non-Interactive Zero Knowledge for a von Neumann Architecture , 2014, USENIX Security Symposium.

[55]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[56]  T. Takagi,et al.  Efficient scalar product protocol and its privacy-preserving application , 2015, Int. J. Electron. Secur. Digit. Forensics.

[57]  Jaideep Vaidya,et al.  Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data , 2006, SAC.

[58]  Hugo Krawczyk,et al.  Outsourced symmetric private information retrieval , 2013, IACR Cryptol. ePrint Arch..

[59]  D. Sculley,et al.  Relaxed online SVMs for spam filtering , 2007, SIGIR.

[60]  Wenliang Du,et al.  A practical approach to solve Secure Multi-party Computation problems , 2002, NSPW '02.

[61]  Craig Gentry,et al.  Quadratic Span Programs and Succinct NIZKs without PCPs , 2013, IACR Cryptol. ePrint Arch..

[62]  Frederik Vercauteren,et al.  Efficient software implementation of ring-LWE encryption , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[63]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[64]  Angelos D. Keromytis,et al.  Blind Seer: A Scalable Private DBMS , 2014, 2014 IEEE Symposium on Security and Privacy.

[65]  Zhe Liu,et al.  Efficient Ring-LWE Encryption on 8-Bit AVR Processors , 2015, CHES.

[66]  Julien Bringer,et al.  Boosting GSHADE Capabilities: New Applications and Security in Malicious Setting , 2016, SACMAT.

[67]  Yiqun Huang,et al.  Privacy preserving association rule mining with scalar product , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[68]  Jonathan Katz,et al.  Efficient Privacy-Preserving Biometric Identification , 2011, NDSS.

[69]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.

[70]  Jonathan Katz,et al.  Secure two-party computation in sublinear (amortized) time , 2012, CCS.

[71]  Jaideep Vaidya,et al.  Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data , 2003 .

[72]  Andrew J. Blumberg,et al.  Verifying computations without reexecuting them: from theoretical possibility to near-practicality , 2013, Electron. Colloquium Comput. Complex..

[73]  Abhi Shelat,et al.  Billion-Gate Secure Computation with Malicious Adversaries , 2012, USENIX Security Symposium.

[74]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[75]  Tsuyoshi Takagi,et al.  Efficient Secure Primitive for Privacy Preserving Distributed Computations , 2012, IWSEC.

[76]  Rebecca N. Wright,et al.  Privacy-preserving Bayesian network structure computation on distributed heterogeneous data , 2004, KDD.

[77]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[78]  Jingjing Lu,et al.  Comparing naive Bayes, decision trees, and SVM with AUC and accuracy , 2003, Third IEEE International Conference on Data Mining.

[79]  Yongdae Kim,et al.  Efficient Cryptographic Primitives for Private Data Mining , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[80]  Wenliang Du,et al.  Secure Multi-party Computational Geometry , 2001, WADS.

[81]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[82]  Basit Shafiq,et al.  Differentially Private Naive Bayes Classification , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[83]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[84]  Hugo Krawczyk,et al.  Randomness Extraction and Key Derivation Using the CBC, Cascade and HMAC Modes , 2004, CRYPTO.

[85]  Ahmad-Reza Sadeghi,et al.  Improved Garbled Circuit Building Blocks and Applications to Auctions and Computing Minima , 2009, IACR Cryptol. ePrint Arch..

[86]  Yehuda Lindell Fast Cut-and-Choose-Based Protocols for Malicious and Covert Adversaries , 2015, Journal of Cryptology.

[87]  Sanguthevar Rajasekaran,et al.  Fast Cryptographic Multi-Party Protocols for Computing Boolean Scalar Products with Applications to Privacy-Preserving Association: Rule Mining in Vertically Partitioned Data , 2007, ICDM.

[88]  Jon Callas,et al.  OpenPGP Message Format , 1998, RFC.

[89]  Hugo Krawczyk,et al.  Highly-Scalable Searchable Symmetric Encryption with Support for Boolean Queries , 2013, IACR Cryptol. ePrint Arch..

[90]  Yucui Guo,et al.  Research on Secure Scalar Product Protocol and Its' Application , 2010, 2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM).

[91]  Wenliang Du,et al.  Protocols for Secure Remote Database Access with Approximate Matching , 2001, E-Commerce Security and Privacy.

[92]  Philip R. Zimmermann,et al.  The official PGP user's guide , 1996 .

[93]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[94]  Jianfeng Ma,et al.  Privacy-Preserving Patient-Centric Clinical Decision Support System on Naïve Bayesian Classification , 2016, IEEE Journal of Biomedical and Health Informatics.

[95]  Vinod Vaikuntanathan,et al.  Fully Homomorphic Encryption from Ring-LWE and Security for Key Dependent Messages , 2011, CRYPTO.

[96]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[97]  Frederik Vercauteren,et al.  Compact Ring-LWE Cryptoprocessor , 2014, CHES.

[98]  Yuval Ishai,et al.  Extending Oblivious Transfers Efficiently , 2003, CRYPTO.

[99]  David Evans,et al.  Obliv-C: A Language for Extensible Data-Oblivious Computation , 2015, IACR Cryptol. ePrint Arch..

[100]  Jonathan Katz,et al.  Faster Secure Two-Party Computation Using Garbled Circuits , 2011, USENIX Security Symposium.

[101]  Changyu Dong,et al.  A Fast Secure Dot Product Protocol with Application to Privacy Preserving Association Rule Mining , 2014, PAKDD.

[102]  Chris Peikert,et al.  How (Not) to Instantiate Ring-LWE , 2016, SCN.

[103]  Bhiksha Raj,et al.  Privacy Preserving Spam Filtering , 2011, ArXiv.

[104]  Yehuda Lindell,et al.  More efficient oblivious transfer and extensions for faster secure computation , 2013, CCS.

[105]  Marcel Keller,et al.  Actively Secure OT Extension with Optimal Overhead , 2015, CRYPTO.

[106]  Marc-Olivier Killijian,et al.  XPIR : Private Information Retrieval for Everyone , 2016, Proc. Priv. Enhancing Technol..

[107]  Huseyin Polat,et al.  Providing Naïve Bayesian Classifier-Based Private Recommendations on Partitioned Data , 2007, PKDD.

[108]  Nir Bitansky,et al.  From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again , 2012, ITCS '12.

[109]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[110]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[111]  D. Sculley,et al.  Relaxed Online SVMs in the TREC Spam Filtering Track , 2007, TREC.

[112]  Seong Joon Oh,et al.  I-Pic: A Platform for Privacy-Compliant Image Capture , 2016, MobiSys.

[113]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[114]  Ju-Sung Kang,et al.  On Fast Private Scalar Product Protocols , 2011, FGIT-SecTech.

[115]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[116]  Gary Robinson,et al.  A statistical approach to the spam problem , 2003 .

[117]  Yue Zhang,et al.  Fast Secure Scalar Product Protocol with (almost) Optimal Efficiency , 2015, CollaborateCom.

[118]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[119]  Craig Gentry,et al.  Pinocchio: Nearly Practical Verifiable Computation , 2013, 2013 IEEE Symposium on Security and Privacy.

[120]  Whitfield Diffie,et al.  New Directions in Cryptography , 1976, IEEE Trans. Inf. Theory.

[121]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.