A Methodology for Information Flow Experiments

Information flow analysis has largely focused on methods that require access to the program in question or total control over an analyzed system. We consider the case where the analyst has neither control over nor a white-box model of the analyzed system. We formalize such limited information flow analyses and study an instance of it: detecting the usage of data by websites. We reduce these problems to ones of causal inference by proving a connection between non-interference and causation. Leveraging this connection, we provide a systematic black-box methodology based on experimental science and statistical analysis. Our systematic study leads to practical advice for detecting web data usage, a previously normalized area. We illustrate these concepts with a series of experiments collecting data on the use of information by websites.

[1]  Saikat Guha,et al.  Challenges in measuring online advertising systems , 2010, IMC '10.

[2]  Rocco De Nicola,et al.  Testing Equivalence for Processes , 1983, ICALP.

[3]  A. Narayanan,et al.  Web Privacy Measurement : Scientific principles , engineering platform , and new results Draft – Jun 1 , 2014 , 2014 .

[4]  J. I The Design of Experiments , 1936, Nature.

[5]  Qiang Ma,et al.  Adscape: harvesting and analyzing online display ads , 2014, WWW.

[6]  P. Rosenbaum Interference Between Units in Randomized Experiments , 2007 .

[7]  Andrew S Zieffler,et al.  Comparing Groups: Randomization and Bootstrap Methods Using R , 2011 .

[8]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[9]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[10]  Hermann A. Maurer,et al.  Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[11]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[12]  Dominique Devriese,et al.  Noninterference through Secure Multi-execution , 2010, 2010 IEEE Symposium on Security and Privacy.

[13]  Rasil Warnakulasooriya,et al.  Patterns, correlates, and reduction of homework copying , 2010 .

[14]  Sander Greenland The Logic and Philosophy of Causal Inference: A Statistical Perspective , 2011 .

[15]  M. Monmonier How to Lie with Maps , 1991 .

[16]  Gregg Rothermel,et al.  Experimental program analysis: a new program analysis paradigm , 2006, ISSTA '06.

[17]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.

[18]  Ahmed H. Tewfik,et al.  Multimedia data-embedding and watermarking technologies , 1998, Proc. IEEE.

[19]  Rick H. Hoyle,et al.  Handbook of structural equation modeling , 2012 .

[20]  Craig E. Wills,et al.  Understanding what they do with what they know , 2012, WPES '12.

[21]  Karen M. Trifonoff How to Lie With Maps, 2nd ed. , 1996 .

[22]  Tyler VanderWeele,et al.  Causality, 2nd edn , 2011 .

[23]  Andrew C. Myers,et al.  Language-based information-flow security , 2003, IEEE J. Sel. Areas Commun..

[24]  John C. Mitchell,et al.  Third-Party Web Tracking: Policy and Technology , 2012, 2012 IEEE Symposium on Security and Privacy.

[25]  Rocco De Nicola,et al.  Testing Equivalences for Processes , 1984, Theor. Comput. Sci..

[26]  Dechang Chen,et al.  The Theory of the Design of Experiments , 2001, Technometrics.

[27]  John McLean,et al.  Security models and information flow , 1990, Proceedings. 1990 IEEE Computer Society Symposium on Research in Security and Privacy.

[28]  Peter Wright,et al.  Spy Catcher : The Candid Autobiography of a Senior Intelligence Officer , 1987 .

[29]  Joseph Y. Halpern,et al.  Secrecy in Multiagent Systems , 2008, TSEC.

[30]  Dennis M. Volpano Safety versus Secrecy , 1999, SAS.

[31]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[32]  A. Prasad Sistla,et al.  Preventing Information Leaks through Shadow Executions , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[33]  Salvatore J. Stolfo,et al.  Baiting Inside Attackers Using Decoy Documents , 2009, SecureComm.

[34]  Ramesh Govindan,et al.  AdReveal: improving transparency into online targeted advertising , 2013, HotNets.

[35]  Wei Xu,et al.  Provably Correct Runtime Enforcement of Non-interference Properties , 2006, ICICS.

[36]  I NICOLETTI,et al.  The Planning of Experiments , 1936, Rivista di clinica pediatrica.

[37]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[38]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination , 2014, ArXiv.

[39]  Donald E. Knuth Two notes on notation , 1992 .

[40]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[41]  Hector Garcia-Molina,et al.  Data Leakage Detection , 2011, IEEE Transactions on Knowledge and Data Engineering.

[42]  R. Shay,et al.  Measuring the Effectiveness of Privacy Tools for Limiting Behavioral Advertising , 2012 .

[43]  Fred B. Schneider,et al.  Enforceable security policies , 2000, TSEC.

[44]  J M Robins,et al.  Identifiability, exchangeability, and epidemiological confounding. , 1986, International journal of epidemiology.

[45]  Gurvan Le Guernic Information Flow Testing , 2007, ASIAN.

[46]  Miranda Mowbray,et al.  Causal security (computer systems) , 1992, [1992] Proceedings The Computer Security Foundations Workshop V.

[47]  Balachander Krishnamurthy,et al.  Privacy leakage vs . Protection measures : the growing disconnect , 2011 .

[48]  Yang Wang,et al.  Smart, useful, scary, creepy: perceptions of online behavioral advertising , 2012, SOUPS.

[49]  Stephen McCamant,et al.  A simulation-based proof technique for dynamic information flow , 2007, PLAS '07.

[50]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[51]  Guilherme Ottoni,et al.  RIFLE: An Architectural Framework for User-Centric Information-Flow Security , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[52]  R. L. Fountain,et al.  The fingerprinted database , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[53]  J. Meseguer,et al.  Security Policies and Security Models , 1982, 1982 IEEE Symposium on Security and Privacy.

[54]  David Wetherall,et al.  Privacy oracle: a system for finding application leaks with black box differential testing , 2008, CCS.

[55]  John McLean,et al.  A general theory of composition for trace sets closed under selective interleaving functions , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[56]  R. Sekar An Efficient Black-box Technique for Defeating Web Application Attacks , 2009, NDSS.

[57]  Landon P. Cox,et al.  TightLip: Keeping Applications from Spilling the Beans , 2007, NSDI.

[58]  Gurvan Le Guernic Information flow testing: the third path towards confidentiality guarantee , 2007 .

[59]  Pedro R. D'Argenio,et al.  Secure information flow by self-composition , 2004, Proceedings. 17th IEEE Computer Security Foundations Workshop, 2004..

[60]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[61]  Limin Jia,et al.  Policy auditing over incomplete logs: theory, implementation and applications , 2011, CCS '11.

[62]  Michael Carl Tschantz,et al.  Information Flow Investigations , 2013 .

[63]  Geoffrey Smith,et al.  A Sound Type System for Secure Flow Analysis , 1996, J. Comput. Secur..

[64]  John Ludbrook,et al.  Analysis of 2 x 2 tables of frequencies: matching test to experimental design. , 2008, International journal of epidemiology.

[65]  Roxana Geambasu,et al.  XRay: Enhancing the Web's Transparency with Differential Correlation , 2014, USENIX Security Symposium.

[66]  James W. Gray,et al.  Toward a mathematical foundation for information flow security , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[67]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[68]  James W. Gray,et al.  Probabilistic interference , 1990, Proceedings. 1990 IEEE Computer Society Symposium on Research in Security and Privacy.

[69]  Amos Fiat,et al.  Tracing traitors , 2000, IEEE Trans. Inf. Theory.

[70]  Jan Vitek,et al.  Secure composition of untrusted code: wrappers and causality types , 2000, Proceedings 13th IEEE Computer Security Foundations Workshop. CSFW-13.