Scaling static taint analysis to industrial SOA applications: a case study at Alibaba

In Alibaba, we have seen a growing demand for tracing data flow for scenarios such as data leak detection, change governance, and data consistency checking. Static taint analysis is a technique for such problems, and many approaches are proposed for high scalability and precision. This paper shares our experience in applying taint analysis in Alibaba. In particular, we find that the state-of-the-art taint analysis tool, FlowDroid, does not work well in our cases because our applications make heavy use of libraries, native methods and enterprise-specific frameworks, which impose two major challenges, scalability and implicit dependency, to FlowDroid. This paper presents ANTaint to address these problems. ANTaint improves scalability by expanding the call graph and applying taint propagation on demand for libraries, which account for majority of the program execution but only a small fraction propagates taints. To improve accuracy, we ensure to build a sound call graph with its core part having certain accuracy, and providing a more precise taint propagation model. The practice of applying ANTaint in the company workload validates the idea. According to an experiment on 60 production cases, ANTaint is correct for 95% of the cases (precision: 95%, recall: 98%) while FlowDroid is 13%. ANTaint takes 65% less time and none of the cases run out of memory with 32 GB limitation.

[1]  Andrew Warfield,et al.  Practical taint-based protection using demand emulation , 2006, EuroSys.

[2]  David Leon,et al.  An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows , 2007, IEEE Transactions on Software Engineering.

[3]  Jacques Klein,et al.  IccTA: Detecting Inter-Component Privacy Leaks in Android Apps , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[4]  Julian Dolby,et al.  Scalable and precise taint analysis for Android , 2015, ISSTA.

[5]  Yannis Smaragdakis,et al.  Static analysis of Java enterprise applications: frameworks and caches, the elephants in the room , 2020, PLDI.

[6]  Patrick Cousot,et al.  Andromeda: Accurate and Scalable Security Analysis of Web Applications , 2013, FASE.

[7]  Benjamin Livshits,et al.  Improving software insecurity with precise static and runtime analysis , 2006 .

[8]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[9]  Yannis Smaragdakis,et al.  P/Taint: unified points-to and taint analysis , 2017, Proc. ACM Program. Lang..

[10]  Sam Blackshear,et al.  Droidel: a general approach to Android framework modeling , 2015, SOAP@PLDI.

[11]  Barbara G. Ryder,et al.  Practical blended taint analysis for JavaScript , 2013, ISSTA.

[12]  Ondrej Lhoták,et al.  Scaling Java Points-to Analysis Using SPARK , 2003, CC.

[13]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[14]  Isil Dillig,et al.  Precise reasoning for programs using containers , 2011, POPL '11.

[15]  Wenke Lee,et al.  CHEX: statically vetting Android apps for component hijacking vulnerabilities , 2012, CCS.

[16]  Stefan Mangard,et al.  SCAnDroid: Automated Side-Channel Analysis of Android APIs , 2018, WISEC.

[17]  Alessandro Orso,et al.  Effective memory protection using dynamic tainting , 2007, ASE '07.

[18]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[19]  Eric Bodden,et al.  Inter-procedural data-flow analysis with IFDS/IDE and Soot , 2012, SOAP '12.

[20]  Zhemin Yang,et al.  LeakMiner: Detect Information Leakage on Android with Static Taint Analysis , 2012, 2012 Third World Congress on Software Engineering.

[21]  Gregor Snelting,et al.  Efficient path conditions in dependence graphs for software safety analysis , 2006, TSEM.

[22]  Andrew C. Myers,et al.  JFlow: practical mostly-static information flow control , 1999, POPL '99.

[23]  David Leon,et al.  Detecting and debugging insecure information flows , 2004, 15th International Symposium on Software Reliability Engineering.

[24]  Eric Bodden,et al.  StubDroid: Automatic Inference of Precise Data-Flow Summaries for the Android Framework , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).