Coarse-Grained Information Flow Control on Hybrid Clouds

Recently, more and more enterprises have adopted hybrid cloud strategies to simultaneously enjoy the security of on-premise clouds and the low cost of public clouds. The key challenge of hybrid clouds, though, stems from the difficulty of specifying where the data should be stored and where the information could flow efficiently. In order to meet security concerns and performance requirements, we introduce a coarse-grained information flow control (CIFC) model to limit storing, accessing, and disclosing of confidential data in public clouds. The CIFC model aims at providing information control implicitly, without the large overhead of periodically checking access privileges. Moreover, since the CIFC model may request redistributing data whenever the secrecy level of a dataset changes, we formulate the data redistribution problem as an optimization problem and propose the Partition Biased Sampling Algorithm (PBSA) for its solution. We implemented the CIFC model on top of Spark, and our results show that Spark applications can achieve 1.4 to 2.1 times better performance by utilizing the additional computational capacity of public cloud to process non-sensitive data. Furthermore, we integrate the PBSA algorithm into Spark and demonstrate a saving of more than 35% in execution time, compared to the Spark default data distribution strategy.

[1]  Eddie Kohler,et al.  Information flow control for standard OS abstractions , 2007, SOSP.

[2]  Joseph Y.-T. Leung,et al.  Minimizing Total Tardiness on One Machine is NP-Hard , 1990, Math. Oper. Res..

[3]  Silas Boyd-Wickizer,et al.  Securing Distributed Systems with Information Flow Control , 2008, NSDI.

[4]  Eddie Kohler,et al.  Making information flow explicit in HiStar , 2006, OSDI '06.

[5]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[6]  Deian Stefan,et al.  Hails: Protecting Data Privacy in Untrusted Web Applications , 2012, OSDI.

[7]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[8]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[9]  Bill MacCarty,et al.  SELinux - NSA's open source security enhanced linux: beating the o-day vulnerability threat , 2005 .

[10]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[11]  Jatinder Singh,et al.  Information Flow Control for Strong Protection with Flexible Sharing in PaaS , 2015, 2015 IEEE International Conference on Cloud Engineering.

[12]  Andrew C. Myers,et al.  Protecting privacy using the decentralized label model , 2000, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[13]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[14]  J. Meseguer,et al.  Security Policies and Security Models , 1982, 1982 IEEE Symposium on Security and Privacy.

[15]  Angelos D. Keromytis,et al.  libdft: practical dynamic data flow tracking for commodity systems , 2012, VEE '12.

[16]  Geoffrey Smith,et al.  A Type-Based Approach to Program Security , 1997, TAPSOFT.

[17]  Asser N. Tantawi Optimized cloud placement of virtual clusters using biased importance sampling , 2012, SIGMETRICS '12.

[18]  Asser N. Tantawi On Biasing towards Optimized Application Placement in the Cloud , 2015, 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[19]  Winnie Cheng,et al.  Abstractions for Usable Information Flow Control in Aeolus , 2012, USENIX Annual Technical Conference.

[20]  Wayne A. Jansen,et al.  Cloud Hooks: Security and Privacy Issues in Cloud Computing , 2011, 2011 44th Hawaii International Conference on System Sciences.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Asser N. Tantawi A Scalable Algorithm for Placement of Virtual Clusters in Large Data Centers , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[23]  Dorothy E. Denning,et al.  A lattice model of secure information flow , 1976, CACM.

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25]  Xi Wang,et al.  Improving application security with data flow assertions , 2009, SOSP '09.