Privacy oracle: a system for finding application leaks with black box differential testing

We describe the design and implementation of Privacy Oracle, a system that reports on application leaks of user information via the network traffic that they send. Privacy Oracle treats each application as a black box, without access to either its internal structure or communication protocols. This means that it can be used over a broad range of applications and information leaks (i.e., not only Web traffic or credit card numbers). To accomplish this, we develop a differential testing technique in which perturbations in the application inputs are mapped to perturbations in the application outputs to discover likely leaks; we leverage alignment algorithms from computational biology to find high quality mappings between different byte-sequences efficiently. Privacy Oracle includes this technique and a virtual machine-based testing system. To evaluate it, we tested 26 popular applications, including system and file utilities, media players, and IM clients. We found that Privacy Oracle discovered many small and previously undisclosed information leaks. In several cases, these are leaks of directly identifying information that are regularly sent in the clear (without end-to-end encryption) and which could make users vulnerable to tracking by third parties or providers.

[1]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[2]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Burkhard Morgenstern,et al.  DIALIGN: finding local similarities by multiple sequence alignment , 1998, Bioinform..

[4]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[5]  Vinod Yegneswaran,et al.  Characteristics of internet background radiation , 2004, IMC '04.

[6]  Marc Dacier,et al.  ScriptGen: an automated script generation tool for Honeyd , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[7]  Randy H. Katz,et al.  Protocol-Independent Adaptive Replay of Application Dialog , 2006, NDSS.

[8]  Jon Crowcroft,et al.  Efficient sequence alignment of network traffic , 2006, IMC '06.

[9]  R. Evans,et al.  Differential testing: a new approach to change detection , 2007, ESEC/SIGSOFT FSE.

[10]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[11]  Gregg Rothermel,et al.  Dynamic Characterization of Web Application Interfaces , 2007, FASE.

[12]  Tadayoshi Kohno,et al.  Devices That Tell on You: Privacy Trends in Consumer Ubiquitous Computing , 2007, USENIX Security Symposium.

[13]  Landon P. Cox,et al.  TightLip: Keeping Applications from Spilling the Beans , 2007, NSDI.

[14]  Alberto Savoia,et al.  Differential testing: a new approach to change detection , 2007, ESEC-FSE '07.

[15]  J. W. Hunt,et al.  An Algorithm for Differential File Comparison , 2008 .

[16]  Stephen McCamant,et al.  Quantitative information flow as network flow capacity , 2008, PLDI '08.

[17]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .