M2R: Enabling Stronger Privacy in MapReduce Computation

New big-data analysis platforms can enable distributed computation on encrypted data by utilizing trusted computing primitives available in commodity server hardware. We study techniques for ensuring privacy-preserving computation in the popular MapReduce framework. In this paper, we first show that protecting only individual units of distributed computation (e.g. map and reduce units), as proposed in recent works, leaves several important channels of information leakage exposed to the adversary. Next, we analyze a variety of design choices in achieving a stronger notion of private execution that is the analogue of using a distributed oblivious-RAM (ORAM) across the platform. We develop a simple solution which avoids using the expensive ORAM construction, and incurs only an additive logarithmic factor of overhead to the latency. We implement our solution in a system called M2R, which enhances an existing Hadoop implementation, and evaluate it on seven standard MapReduce benchmarks. We show that it is easy to port most existing applications to M2R by changing fewer than 43 lines of code. M2R adds fewer than 500 lines of code to the TCB, which is less than 0.16% of the Hadoop codebase. M2R offers a factor of 1.3× to 44.6× lower overhead than extensions of previous solutions with equivalent privacy. M2R adds a total of 17% to 130% overhead over the insecure baseline solution that ignores the leakage channels M2R addresses.

[1]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[2]  Rafail Ostrovsky,et al.  Software protection and simulation on oblivious RAMs , 1996, JACM.

[3]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[4]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[6]  Ran Canetti,et al.  Universally composable security: a new paradigm for cryptographic protocols , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[7]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[8]  T. Alves,et al.  TrustZone : Integrated Hardware and Software Security , 2004 .

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Douglas Wikström,et al.  A Universally Composable Mix-Net , 2004, TCC.

[11]  David Schultz,et al.  The Program Counter Security Model: Automatic Detection and Removal of Control-Flow Side Channel Attacks , 2005, ICISC.

[12]  Marek Klonowski,et al.  Provable Anonymity for Networks of Mixes , 2005, Information Hiding.

[13]  Jan Camenisch,et al.  Mix-Network with Stronger Security , 2005, Privacy Enhancing Technologies.

[14]  Rafail Ostrovsky,et al.  Searchable symmetric encryption: improved definitions and efficient constructions , 2006, CCS '06.

[15]  Vitaly Shmatikov,et al.  Measuring relationship anonymity in mix networks , 2006, WPES '06.

[16]  Yehuda Lindell,et al.  Introduction to Modern Cryptography (Chapman & Hall/Crc Cryptography and Network Security Series) , 2007 .

[17]  Adrian Perrig,et al.  SecVisor: a tiny hypervisor to provide lifetime kernel code integrity for commodity OSes , 2007, SOSP.

[18]  Yehuda Lindell,et al.  Introduction to Modern Cryptography , 2004 .

[19]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[20]  George Danezis,et al.  A Survey of Anonymous Communication Channels , 2008 .

[21]  Michael K. Reiter,et al.  Flicker: an execution infrastructure for tcb minimization , 2008, Eurosys '08.

[22]  Xiaoxin Chen,et al.  Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems , 2008, ASPLOS.

[23]  Ariel J. Feldman,et al.  Lest we remember: cold-boot attacks on encryption keys , 2008, CACM.

[24]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[25]  Craig Gentry,et al.  Fully Homomorphic Encryption over the Integers , 2010, EUROCRYPT.

[26]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[27]  Udo Steinberg,et al.  NOVA: a microhypervisor-based secure virtualization architecture , 2010, EuroSys '10.

[28]  Adrian Perrig,et al.  TrustVisor: Efficient TCB Reduction and Attestation , 2010, 2010 IEEE Symposium on Security and Privacy.

[29]  Ruby B. Lee,et al.  Scalable architectural support for trusted software , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[30]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[31]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[32]  Haibo Chen,et al.  CloudVisor: retrofitting protection of virtual machines in multi-tenant cloud with nested virtualization , 2011, SOSP.

[33]  Michael T. Goodrich,et al.  Privacy-Preserving Access of Outsourced Data via Oblivious RAM Simulation , 2010, ICALP.

[34]  Hari Balakrishnan,et al.  CryptDB: protecting confidentiality with encrypted query processing , 2011, SOSP.

[35]  Shengsheng Huang,et al.  HiBench : A Representative and Comprehensive Hadoop Benchmark Suite , 2012 .

[36]  Ruby B. Lee,et al.  Architectural support for hypervisor-secure virtualization , 2012, ASPLOS XVII.

[37]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[38]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[39]  Simha Sethumadhavan,et al.  Side-channel vulnerability factor: A metric for measuring information leakage , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[40]  Taesoo Kim,et al.  STEALTHMEM: System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud , 2012, USENIX Security Symposium.

[41]  Samuel Madden,et al.  Processing Analytical Queries over Encrypted Data , 2013, Proc. VLDB Endow..

[42]  Elaine Shi,et al.  Memory Trace Oblivious Program Execution , 2013, 2013 IEEE 26th Computer Security Foundations Symposium.

[43]  Elaine Shi,et al.  Path ORAM: an extremely simple oblivious RAM protocol , 2012, CCS.

[44]  Ramarathnam Venkatesan,et al.  Secure database-as-a-service with Cipherbase , 2013, SIGMOD '13.

[45]  Nickolai Zeldovich,et al.  An Ideal-Security Protocol for Order-Preserving Encoding , 2013, 2013 IEEE Symposium on Security and Privacy.

[46]  Jun Han,et al.  KISS: "Key It Simple and Secure" Corporate Key Management , 2013, TRUST.

[47]  Marina Blanton,et al.  Data-oblivious graph algorithms for secure computation and outsourcing , 2013, ASIA CCS '13.

[48]  Carlos V. Rozas,et al.  Innovative instructions and software model for isolated execution , 2013, HASP '13.

[49]  Shweta Shinde,et al.  AUTOCRYPT: enabling homomorphic computation on servers to protect sensitive web content , 2013, CCS.

[50]  Beng Chin Ooi,et al.  Distributed data management using MapReduce , 2014, CSUR.

[51]  Galen C. Hunt,et al.  Shielding Applications from an Untrusted Cloud with Haven , 2014, OSDI.

[52]  Olga Ohrimenko,et al.  Data-Oblivious Algorithms for Privacy-Preserving Access to Cloud Storage , 2014 .

[53]  Christos Gkantsidis,et al.  VC 3 : Trustworthy Data Analytics in the Cloud , 2014 .

[54]  Elaine Shi,et al.  Automating Efficient RAM-Model Secure Computation , 2014, 2014 IEEE Symposium on Security and Privacy.

[55]  Shweta Shinde,et al.  A model counter for constraints over unbounded strings , 2014, PLDI.

[56]  Radu Sion,et al.  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data Confidentiality , 2011, IEEE Transactions on Knowledge and Data Engineering.

[57]  James Newsome,et al.  MiniBox: A Two-Way Sandbox for x86 Native Code , 2014, USENIX Annual Technical Conference.

[58]  Elaine Shi,et al.  GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation , 2015, ASPLOS.

[59]  P. Saxena,et al.  Protecting Legacy Applications with a Purely Hardware TCB , 2015 .

[60]  Kian-Lee Tan,et al.  epiC: an extensible and scalable system for processing Big Data , 2014, The VLDB Journal.

[61]  Shweta Shinde,et al.  Preventing Your Faults From Telling Your Secrets: Defenses Against Pigeonhole Attacks , 2015, ArXiv.

[62]  Stratis Ioannidis,et al.  GraphSC: Parallel Secure Computation Made Easy , 2015, 2015 IEEE Symposium on Security and Privacy.

[63]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.