Do You Know Where Your Data Are? Secure Data Capsules for Deployable Data Protection

Do you know where your data are? Who can see them? Who can modify them without a trace? Who can aggregate, summarize, and embed them for purposes other than yours? We don’t, and we suspect neither do you. The problem is that we do not have a widely-available mechanism to answer these questions, and yet, paradoxically, all evidence shows that it should have been solved long ago. The problem is critical; incidents involving sensitive data leakage, unauthorized access, and integrity violations (accidental or not) are a daily occurrence [1]. It is well known, as evidenced by the volume of relevant government regulation and pontification from privacy advocates. It is interesting, since it has inspired much research into data confidentiality, integrity, and authorization. Yet publicizing it, regulating it, and talking about it have not led to solving the problem effectively for the vast majority of users. Why? We believe the root of this paradox lies in the disconnect between research, policy, and industry objectives, and the needs of the real world. Research has largely focused on elegance and intellectual exploration, while industry has built expedient solutions for media content protection and enterprise rights management, guided by projected revenue. In either case, the problem is solved for some users, under narrow scenarios, but neither research results nor rights management systems have resulted in broad applicability and deployment. In this paper, we examine the problem of protecting data for all users, and not just for some. We use broad applicability as the driving goal, and explore the challenges and promises of reaching towards that goal with efficiency and high assurance. To be broadly applicable, a data protection mechanism must support, first and foremost, backward compatibility. Any mechanism must seamlessly integrate with unmodified applications and data formats. Even in the face of vulnerabilities, pragmatic concerns make it tough to argue for the wholesale migration of systems from existing infrastructure and legacy programs to something completely new. Research proposals that require completely new operating systems [11, 32], languages [21], or both [25], do not apply to legacy systems and therefore contribute only indirectly to solving our problem. Beyond the ability to support existing applications, a data protection mechanism should be flexible. Solutions based on a centrally vetted, closed set of applications are incompatible with today’s diverse, dynamic systems. A prime example are commercially available enterprise rights management (ERM) solutions. Although technical details and scrutiny are scarce, ERM products appear to be based on a common proprietary application framework [3,20] or on application-specific modifications [12]. Consumerization, which fosters individual choice, leads many users to find the “walled gardens” of ERM systems unacceptable. On the other hand, the limited set of supported applications inhibits interoperability among enterprises—a cost organizations often refuse to bear. When an organization does decide to move to a different application platform, it may risk losing continuity within its own documents, at a potentially astronomical cost. As a motivating example, consider protecting the privacy and integrity of a patient’s—we will call him Owen—personal health records. In the USA, the privacy of these records is a requirement laid down by statute, with significant penalties for violations [2]. Their integrity is also critical: unauthorized modifications could be fatal—consider the inadvertent removal of Owen’s alergies from his record. Current usage exhibits significant complexity. Owen’s record may be handled by all the physicians who treat or advise him, all of his insurers over time and employment changes, and all the hospitals and clinics where he is seen. This usually large set of people and organizations view and modify Owen’s record on varied platforms (clinic PCs, mobile tablets, web forms), managed by IT staff of varying skill. It is unreasonable to expect such a diverse conglomeration of users and systems to change en masse into a new, common OS and a small set of blessed applications. At the same time, restrictive solutions like ERMs are too specific, and do not work seamlessly across hospitals with distinct infrastructures (e.g., when Owen is treated while on vacation abroad). Although representative in criticality and complexity, this example is by no means unique. From social networking to advertising systems, from financial transactions to email records, examples abound, each with varying privacy and integrity requirements.

[1]  Christoforos E. Kozyrakis,et al.  Raksha: a flexible information flow architecture for software security , 2007, ISCA '07.

[2]  Richard Johnson,et al.  The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[3]  Andrew Warfield,et al.  Practical taint-based protection using demand emulation , 2006, EuroSys.

[4]  David Lie,et al.  Splitting interfaces: making trust between applications and operating systems configurable , 2006, OSDI '06.

[5]  Qing Zhang,et al.  Neon: system support for derived data management , 2010, VEE '10.

[6]  Craig B. Zilles,et al.  A real system evaluation of hardware atomicity for software speculation , 2010, ASPLOS XV.

[7]  Andrew C. Myers,et al.  Protecting privacy using the decentralized label model , 2000, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[8]  Scott Shenker,et al.  Towards Practical Taint Tracking , 2010 .

[9]  LiskovBarbara,et al.  Protecting privacy using the decentralized label model , 2000 .

[10]  Babak Falsafi,et al.  Flexible Hardware Acceleration for Instruction-Grain Lifeguards , 2009, IEEE Micro.

[11]  Guru Venkataramani,et al.  FlexiTaint: A programmable accelerator for dynamic taint propagation , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[12]  Eddie Kohler,et al.  Information flow control for standard OS abstractions , 2007, SOSP.

[13]  Lynda L. McGhie,et al.  THE HEALTH INSURANCE PORTABILITY AND ACCOUNTABILITY ACT , 2004 .

[14]  Adrian Perrig,et al.  TrustVisor: Efficient TCB Reduction and Attestation , 2010, 2010 IEEE Symposium on Security and Privacy.

[15]  Donald E. Porter,et al.  Laminar: practical fine-grained decentralized information flow control , 2009, PLDI '09.

[16]  Michael Backes,et al.  Automatic Discovery and Quantification of Information Leaks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[17]  Wei Xu,et al.  Taint-Enhanced Policy Enforcement: A Practical Approach to Defeat a Wide Range of Attacks , 2006, USENIX Security Symposium.

[18]  Christoforos E. Kozyrakis,et al.  Decoupling Dynamic Information Flow Tracking with a dedicated coprocessor , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[19]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[20]  Marianne Shaw,et al.  Scale and performance in the Denali isolation kernel , 2002, OSDI '02.

[21]  Steve Vandebogart,et al.  Labels and event processes in the Asbestos operating system , 2005, TOCS.

[22]  Richard Johnson,et al.  The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.

[23]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[24]  Eddie Kohler,et al.  Making information flow explicit in HiStar , 2006, OSDI '06.

[25]  Herbert Bos,et al.  Pointless tainting?: evaluating the practicality of pointer tainting , 2009, EuroSys '09.

[26]  Anthony D. Joseph,et al.  Virtics : A System for Privilege Separation of Legacy Desktop Applications , 2010 .

[27]  Xiaoxin Chen,et al.  Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems , 2008, ASPLOS.