Airavat: Security and Privacy for MapReduce

We present Airavat, a MapReduce-based system which provides strong security and privacy guarantees for distributed computations on sensitive data. Airavat is a novel integration of mandatory access control and differential privacy. Data providers control the security policy for their sensitive data, including a mathematical bound on potential privacy violations. Users without security expertise can perform computations on the data, but Airavat confines these computations, preventing information leakage beyond the data provider's policy. Our prototype implementation demonstrates the flexibility of Airavat on several case studies. The prototype is efficient, with run times on Amazon's cloud computing infrastructure within 32% of a MapReduce system with no security.

[1]  Butler W. Lampson,et al.  A note on the confinement problem , 1973, CACM.

[2]  D. E. Bell,et al.  Secure Computer Systems : Mathematical Foundations , 2022 .

[3]  Steven B. Lipner,et al.  A comment on the confinement problem , 1975, SOSP.

[4]  K J Biba,et al.  Integrity Considerations for Secure Computer Systems , 1977 .

[5]  P. S. Tasker,et al.  DEPARTMENT OF DEFENSE TRUSTED COMPUTER SYSTEM EVALUATION CRITERIA , 1985 .

[6]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[7]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[8]  Mary Ellen Zurko,et al.  A Retrospective on the VAX VMM Security Kernel , 1991, IEEE Trans. Software Eng..

[9]  Wei-Ming Hu Reducing Timing Channels with Fuzzy Time , 1992, J. Comput. Secur..

[10]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[11]  Andrew C. Myers,et al.  A decentralized model for information flow control , 1997, SOSP.

[12]  Andrew C. Myers,et al.  JFlow: practical mostly-static information flow control , 1999, POPL '99.

[13]  Andrew C. Myers,et al.  Jif: java information flow , 1999 .

[14]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[15]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[16]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17]  Andrew C. Myers,et al.  Secure program partitioning , 2002, TOCS.

[18]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[19]  Vincent Simonet Flow Caml in a Nutshell , 2003 .

[20]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[21]  Andrew C. Myers,et al.  Language-based information-flow security , 2003, IEEE J. Sel. Areas Commun..

[22]  Bill McCarty,et al.  Selinux: NSA's Open Source Security Enhanced Linux , 2004 .

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Bill MacCarty,et al.  SELinux - NSA's open source security enhanced linux: beating the o-day vulnerability threat , 2005 .

[25]  Steve Vandebogart,et al.  Labels and event processes in the Asbestos operating system , 2005, TOCS.

[26]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[27]  J. Heitzig THE “JACKKNIFE” METHOD: CONFIDENTIALITY PROTECTION FOR COMPLEX STATISTICAL ANALYSES , 2005 .

[28]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[29]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[30]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[31]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[32]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[33]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[34]  Eddie Kohler,et al.  Making information flow explicit in HiStar , 2006, OSDI '06.

[35]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[36]  Eddie Kohler,et al.  Information flow control for standard OS abstractions , 2007, SOSP.

[37]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[38]  Sushil Jajodia,et al.  Information disclosure under realistic assumptions: privacy versus optimality , 2007, CCS '07.

[39]  Cynthia Dwork,et al.  Ask a Better Question, Get a Better Answer A New Approach to Private Data Analysis , 2007, ICDT.

[40]  Michael Walfish,et al.  World Wide Web Without Walls , 2007, HotNets.

[41]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[42]  Cynthia Dwork,et al.  An Ad Omnia Approach to Defining and Achieving Private Data Analysis , 2007, PinKDD.

[43]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[44]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[45]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[46]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[47]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[48]  Xin Zheng,et al.  Secure web applications via automatic partitioning , 2007, SOSP.

[49]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[50]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[51]  Silas Boyd-Wickizer,et al.  Securing Distributed Systems with Information Flow Control , 2008, NSDI.

[52]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[53]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[54]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[55]  Joseph M. Hellerstein,et al.  Quantitative Data Cleaning for Large Databases , 2008 .

[56]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[57]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[58]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[59]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[60]  Donald E. Porter,et al.  Laminar: practical fine-grained decentralized information flow control , 2009, PLDI '09.

[61]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[62]  Hovav Shacham,et al.  Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds , 2009, CCS.