A Look Back on a Function Identification Problem

A function recognition problem serves as a basis for further binary analysis and many applications. Although common challenges for function detection are well known, prior works have repeatedly claimed a noticeable result with a high precision and recall. In this paper, we aim to fill the void of what has been overlooked or misinterpreted by closely looking into the previous datasets, metrics, and evaluations with varying case studies. Our major findings are that i) a common corpus like GNU utilities is insufficient to represent the effectiveness of function identification, ii) it is difficult to claim, at least in the current form, that an ML-oriented approach is scientifically superior to deterministic ones like IDA or Ghidra, iii) the current metrics may not be reasonable enough to measure varying function detection cases, and iv) the capability of recognizing functions depends on each tool’s strategic or peculiar choice. We perform re-evaluation of existing approaches on our own dataset, demonstrating that not a single state-of-the-art tool dominates all the others. In conclusion, a function detection problem has not yet been fully addressed, and we need a better methodology and metric to make advances in the field of function identification.

[1]  Herbert Bos,et al.  Compiler-Agnostic Function Detection in Binaries , 2017, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[2]  Tosiron Adegbija,et al.  A Workload Characterization of the SPEC CPU2017 Benchmark Suite , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[4]  David Brumley,et al.  BYTEWEIGHT: Learning to Recognize Functions in Binary Code , 2014, USENIX Security Symposium.

[5]  Dinghao Wu,et al.  Semantics-Aware Machine Learning for Function Recognition in Binary Code , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[6]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[7]  Dawn Xiaodong Song,et al.  Recognizing Functions in Binaries with Neural Networks , 2015, USENIX Security Symposium.

[8]  R. Sekar,et al.  Function Interface Analysis: A Principled Approach for Function Recognition in COTS Binaries , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[9]  Barton P. Miller,et al.  Learning to Analyze Binary Computer Code , 2008, AAAI.

[10]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[11]  Giovanni Agosta,et al.  rev.ng: a unified binary analysis framework to recover CFGs and function boundaries , 2017, CC.

[12]  Gang Wang,et al.  LEMNA: Explaining Deep Learning based Security Applications , 2018, CCS.

[13]  Barton P. Miller,et al.  Labeling library functions in stripped binaries , 2011, PASTE '11.

[14]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[15]  Jim Alves-Foss,et al.  Function boundary detection in stripped binaries , 2019, ACSAC.

[16]  David Brumley,et al.  BAP: A Binary Analysis Platform , 2011, CAV.

[17]  Xi Chen,et al.  An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries , 2016, USENIX Security Symposium.