Android: From Reversing to Decompilation

This talk deals with Android’s bytecode analysis. The Android system is now widespread, and lots of applications are developed each days. These applications are mostly written in Java, though it is possible to do calls to binaries or shared libraries. To be executed on the DVM the Java source code is translated into Java bytecode (.class files) and then a tool named ‘dx’ is used to convert it into the Dalvik format (.dex files). Such a conversion is needed as the DVM is a register-based machine whereas the JVM is a stack-based one, and as such they have different opcodes. Due to the nature of the bytecode, its reversing is somewhat easier than machine code. Indeed, unlike machine code, (Dalvik) bytecode contains semantic information that allows us to do a better analysis. We can get useful details on variables, fields, methods. . . We can create signatures for a method, or we can use the android permissions to see where a specific one is used in an application. The analysis part allows us to extract the control flow graph (which is composed of basic blocks, and which cannot be modified dynamically due to the virtual machine) which is used to represent the different possibles paths of an application. Furthermore, we have implemented new algorithms to calculate the similarity distance between two applications, a useful information to know if your application has been stolen from the android market. It’s also possible to use similarity to do ‘diffing’ of Android applications is useful to see patches of bugs or insertion of evil code, this is why we have developed a combination of techniques to quickly see the differences between two applications. Moreover it’s interesting to have the ability to manipulate in a simple way all these new formats (APK, DEX, Dalvik bytecode, Android’s binary xml) to automate testing directly in a program or in a specific interpreter. There are some ways to retrieve the Java source code of an application from the bytecode, for instance some people use a software which transforms Dex bytecode into Java bytecode and then combine it with a regular Java decompiler. But the resulting code often looks like an obfuscated version which does not compile than real source code. That’s why we developed a new decompiler which uses only Dalvik bytecode to create an original Java source code. We present a new open-source tool (Androguard) written in Python (and some parts of C language) which help the reversing of Android applications, as well as our decompiler.

[1]  Ming Li,et al.  Clustering by compression , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[2]  Laurie J. Hendren,et al.  Programmer-friendly Decompiled Java , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[3]  Yang Xiang,et al.  Classification of malware using structured control flow , 2010 .