A Comparison of Android Reverse Engineering Tools via Program Behaviors Validation Based on Intermediate Languages Transformation

In Android, performing a program analysis directly on an executable source is usually inconvenient. Therefore, a reverse engineering technique has been adapted to enable a user to perform a program analysis on a textual form of the executable source which is represented by an intermediate language (IL). For Android, Smali, Jasmin, and Jimple ILs have been introduced to represent applications executable Dalvik bytecode in a human-readable form. To use these ILs, we downloaded three of the most popular Android reversing tools, including Apktool, dex2jar, and Soot, which perform transformation of the executable source into Smali, Jasmin, and Jimple ILs, respectively. However, the main concern here is that inaccurate transformation of the executable source may severely degrade the program analysis performance, and obscure the results. To the best of our knowledge, it is still unknown which tool most accurately performs a transformation of the executable source so that the re-assembled Android applications can be executed, and their original behaviors remain intact. Therefore, in this paper, we conduct an experiment to identify the tool which most accurately performs the transformation. We designed a statistical event-based comparative scheme, and conducted a comprehensive empirical study on a set of 1,300 Android applications. Using the designed scheme, we compare Apktool, dex2jar, and Soot via random-event-based and statistical tests to determine the tool which allows the re-assembled applications to be executed, and evaluate how closely they preserve their original behaviors. Our experimental results show that Apktool, using Smali IL, perform the most accurate transformation of the executable source since the applications, which are assembled from Smali, exhibit their behaviours closest to the original ones.

[1]  David Cutting,et al.  An extensible benchmark and tooling for comparing reverse engineering approaches , 2015 .

[2]  Eldad Eilam,et al.  Reversing: Secrets of Reverse Engineering , 2005 .

[3]  Laurie J. Hendren,et al.  Optimizing Java Bytecode Using the Soot Framework: Is It Feasible? , 2000, CC.

[4]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[5]  Cole Davis SPSS for Applied Sciences: Basic Statistical Testing , 2013 .

[6]  Thorsten Holz,et al.  Slicing droids: program slicing for smali code , 2013, SAC '13.

[7]  Jacques Klein,et al.  Dexpler: converting Android Dalvik bytecode to Jimple for static analysis with Soot , 2012, SOAP '12.

[8]  M. N. Armstrong,et al.  Evaluating architectural extractors , 1998, Proceedings Fifth Working Conference on Reverse Engineering (Cat. No.98TB100261).

[9]  Francesco Tisato,et al.  A comparison of reverse engineering tools based on design pattern decomposition , 2005, 2005 Australian Software Engineering Conference.

[10]  Angelos Stavrou,et al.  Analysis of Android Applications' Permissions , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability Companion.

[11]  Tibor Gyimóthy,et al.  Towards a Benchmark for Evaluating Reverse Engineering Tools , 2008, 2008 15th Working Conference on Reverse Engineering.

[12]  Yann-Gaël Guéhéneuc,et al.  A comparative framework for design recovery tools , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[13]  Thomas Dullien,et al.  REIL: A platform-independent intermediate representation of disassembled code for static code analysis , 2009 .

[14]  Lwin Khin Shar,et al.  Empirical Comparison of Intermediate Representations for Android Applications , 2014, SEKE.

[15]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[16]  Saumya K. Debray,et al.  Obfuscation of executable code to improve resistance to static disassembly , 2003, CCS '03.

[17]  Harald C. Gall,et al.  A comparison of four reverse engineering tools , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[18]  Alessandra Gorla,et al.  Automated Test Input Generation for Android: Are We There Yet? (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Xuxian Jiang,et al.  DroidChameleon: evaluating Android anti-malware against transformation attacks , 2013, ASIA CCS '13.

[20]  Mishra Amit,et al.  Reverse Engineering: The Promising Technology , 2010 .

[21]  B. Bowerman Statistical Design and Analysis of Experiments, with Applications to Engineering and Science , 1989 .

[22]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[23]  Hausi A. Müller,et al.  Reverse engineering: a roadmap , 2000, ICSE '00.

[24]  Sylvain Lamprier,et al.  CARE: A Platform for Reliable Comparison and Analysis of Reverse-Engineering Techniques , 2013, 2013 18th International Conference on Engineering of Complex Computer Systems.

[25]  Claudia Raibulet,et al.  Model-Driven Reverse Engineering Approaches: A Systematic Literature Review , 2017, IEEE Access.

[26]  Ondrej Lhoták,et al.  The Soot framework for Java program analysis: a retrospective , 2011 .

[27]  Sahin Albayrak,et al.  Using static analysis for automatic assessment and mitigation of unwanted and malicious activities within Android applications , 2011, 2011 6th International Conference on Malicious and Unwanted Software.

[28]  Foutse Khomh,et al.  A Taxonomy for Program Metamodels in Program Reverse Engineering , 2016, ICSME.

[29]  Laurie Hendren,et al.  Jimple: Simplifying Java Bytecode for Analyses and Transformations , 1998 .

[30]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[31]  Fülöp Lajos Jenő Evaluating and Improving Reverse Engineering Tools , 2011 .

[32]  Wego Wang,et al.  Reverse Engineering: Technology of Reinvention , 2010 .

[33]  Michael Franz,et al.  A Tree-Based Alternative to Java Byte-Codes , 1999, International Journal of Parallel Programming.

[34]  James H. Cross,et al.  Reverse engineering and design recovery: a taxonomy , 1990, IEEE Software.

[35]  Kenneth Benoit,et al.  Linear Regression Models with Logarithmic Transformations , 2011 .

[36]  James Gosling Java intermediate bytecodes: ACM SIGPLAN workshop on intermediate representations (IR'95) , 1995, IR '95.

[37]  Steven M. Christey The Infinite Monkey Protocol Suite (IMPS) , 2000, RFC.

[38]  David Chisnall The challenge of cross-language interoperability , 2013, CACM.

[39]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[40]  Julian Dolby Program analysis for mobile: how and why to run WALA on your phone , 2015, MobileDeLi.