A scalable, flow-and-context-sensitive taint analysis of android applications

Abstract This paper focuses on scalable static analysis techniques for finding information leaks in Android apps. Finding such leaks scalably is challenging because Android apps have on average over 100 invocations of sensitive APIs, yielding a massive multi-source taint analysis problem. We present the design of STAR, a context-sensitive and flow-sensitive multi-source taint analysis aimed at tackling this problem. STAR incorporates two main ideas to achieve high performance and scalability. The first is a novel summarization technique we refer to as symbolic summarization, which is crucial for the analysis to scale well with the number of source APIs. The second is a combination of techniques aimed at efficient propagation of abstract states both within and across method boundaries. Our experiments over a dataset composed of 400,000 apps show that the proposed techniques improve performance over an IFDS-style analysis by a factor of 30 on average, and by up to four orders of magnitude on large apps.

[1]  Magnus Madsen,et al.  Sparse Dataflow Analysis with Pointers and Reachability , 2014, SAS.

[2]  Matthew Might,et al.  Improving flow analyses via ΓCFA: abstract garbage collection and counting , 2006, ICFP '06.

[3]  Isil Dillig,et al.  Apposcopy: semantics-based detection of Android malware through static analysis , 2014, SIGSOFT FSE.

[4]  Manu Sridharan,et al.  TAJ: effective taint analysis of web applications , 2009, PLDI '09.

[5]  Mads Tofte,et al.  Region-based Memory Management , 1997, Inf. Comput..

[6]  Olin Shivers,et al.  CFA2: A Context-Free Approach to Control-Flow Analysis , 2010, ESOP.

[7]  Xin Zhang,et al.  Hybrid top-down and bottom-up interprocedural analysis , 2014, PLDI.

[8]  Jong-Deok Choi,et al.  Escape analysis for Java , 1999, OOPSLA '99.

[9]  Thomas W. Reps,et al.  Recency-Abstraction for Heap-Allocated Storage , 2006, SAS.

[10]  Patrick Cousot,et al.  Andromeda: Accurate and Scalable Security Analysis of Web Applications , 2013, FASE.

[11]  Jeehoon Kang,et al.  Global Sparse Analysis Framework , 2014, TOPL.

[12]  Matthew Might,et al.  Pushdown control-flow analysis for free , 2016, POPL.

[13]  Laurie J. Hendren,et al.  Practical virtual method call resolution for Java , 2000, OOPSLA '00.

[14]  Jacques Klein,et al.  IccTA: Detecting Inter-Component Privacy Leaks in Android Apps , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[15]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[16]  Hakjoo Oh,et al.  Access Analysis-Based Tight Localization of Abstract Memories , 2011, VMCAI.

[17]  Mira Mezini,et al.  Access-Path Abstraction: Scaling Field-Sensitive Data-Flow Analysis with Unbounded Access Paths (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[19]  Bruno Blanchet,et al.  Escape analysis for object-oriented languages: application to Java , 1999, OOPSLA '99.

[20]  Ondrej Lhoták,et al.  Practical Extensions to the IFDS Algorithm , 2010, CC.

[21]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[22]  Peter W. O'Hearn,et al.  Local Reasoning about Programs that Alter Data Structures , 2001, CSL.

[23]  Hakjoo Oh,et al.  Access-Based Localization with Bypassing , 2011, APLAS.

[24]  Lian Li,et al.  Boosting the performance of flow-sensitive points-to analysis using value flow , 2011, ESEC/FSE '11.

[25]  Frank Tip,et al.  Precise Data Flow Analysis in the Presence of Correlated Method Calls , 2015, SAS.

[26]  Eran Yahav,et al.  Effective typestate verification in the presence of aliasing , 2006, TSEM.

[27]  Ben Hardekopf,et al.  Flow-sensitive pointer analysis for millions of lines of code , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[28]  John A. Allen,et al.  The anatomy of lisp , 1980 .

[29]  Martin C. Rinard,et al.  Compositional pointer and escape analysis for Java programs , 1999, OOPSLA '99.

[30]  Rohan Padhye,et al.  Interprocedural data flow analysis in Soot using value contexts , 2013, SOAP '13.

[31]  Peter Thiemann,et al.  Interprocedural Analysis with Lazy Propagation , 2010, SAS.

[32]  Thomas W. Reps,et al.  Precise Interprocedural Dataflow Analysis with Applications to Constant Propagation , 1995, TAPSOFT.

[33]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[34]  Chris Okasaki,et al.  Fast Mergeable Integer Maps , 1998 .

[35]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[36]  Shay Artzi,et al.  F4F: taint analysis of framework-based web applications , 2011, OOPSLA '11.

[37]  Isil Dillig,et al.  Precise and compact modular procedure summaries for heap manipulating programs , 2011, PLDI '11.

[38]  David K. Gifford,et al.  Polymorphic effect systems , 1988, POPL '88.

[39]  Mira Mezini,et al.  FlowTwist: efficient context-sensitive inside-out taint analysis for large codebases , 2014, SIGSOFT FSE.

[40]  Thomas P. Murtagh,et al.  Lifetime analysis of dynamically allocated objects , 1988, POPL '88.

[41]  Manu Sridharan,et al.  PSE: explaining program failures via postmortem static analysis , 2004, SIGSOFT '04/FSE-12.

[42]  Matthew Might,et al.  Introspective pushdown analysis of higher-order programs , 2012, ICFP.

[43]  Matthew Might,et al.  AnaDroid: Malware Analysis of Android with User-supplied Predicates , 2015, Electron. Notes Theor. Comput. Sci..

[44]  Jeff H. Perkins,et al.  Information Flow Analysis of Android Applications in DroidSafe , 2015, NDSS.

[45]  Sriram K. Rajamani,et al.  Bebop: a path-sensitive interprocedural dataflow engine , 2001, PASTE '01.