Automated Binary Program Partitioning through Structural Analysis

Binary code has emerged as the standard format for storing various applications, both in industry and academia, owing to its lightweight nature, easy installation, and faster processing. The increasing number of outdated legacy applications in binary format has created a demand for efficient tools capable of automatically processing and updating them. How-ever, accurately processing these binaries poses challenges due to the presence of hidden functionality embedded in multiple sub tasks within the program. This paper presents a methodol-ogy for the automated structural analysis and partitioning of binary programs. Through an iterative analysis of subgraphs extracted from the complete binary's Control-Flow graph using a hybrid approach that combines a Deep Graph Convolution Neural Network model and a formal analysis module, we successfully identify graph structures representing different functionalities. Specifically, this study focuses on four functionalities: matrix multiplication, matrix transposition, array processing, and mathematical functions. Experimental results on a synthetic dataset demonstrate the high performance of the proposed method in accurately isolating the relevant nodes associated with these functionalities. As a result, these nodes are annotated within the Control-Flow graph, facilitating easier processing by legacy-code analysis tools.

[1]  R. Nord,et al.  Industry's Cry for Tools that Support Large-Scale Refactoring , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[2]  Adriano Pizzini Behavior-based test smells refactoring : Toward an automatic approach to refactoring Eager Test and Lazy Test smells , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[3]  Mohamed Wiem Mkaouer,et al.  Code Review Practices for Refactoring Changes: An Empirical Study on OpenStack , 2022, 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR).

[4]  Carl Yang,et al.  On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs , 2021, CIKM.

[5]  Xuezixiang Li,et al.  PalmTree: Learning an Assembly Language Model for Instruction Embedding , 2021, CCS.

[6]  Zhen Cui,et al.  Dual-Attention Graph Convolutional Network , 2019, ACPR.

[7]  Marvin Wyrich,et al.  Towards an Autonomous Bot for Automatic Source Code Refactoring , 2019, 2019 IEEE/ACM 1st International Workshop on Bots in Software Engineering (BotSE).

[8]  Caio Barbosa,et al.  Poster: The Buggy Side of Code Refactoring: Understanding the Relationship Between Refactorings and Bug , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[9]  Shinji Kusumoto,et al.  Toward Refactoring Evaluation with Code Naturalness , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[10]  Yixin Chen,et al.  An End-to-End Deep Learning Architecture for Graph Classification , 2018, AAAI.

[11]  Hidehiko Masuhara,et al.  Automated Refactoring of Legacy Java Software to Default Methods , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[12]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[13]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[14]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[15]  Emerson R. Murphy-Hill,et al.  Towards refactoring-aware code review , 2014, CHASE.

[16]  Emerson R. Murphy-Hill,et al.  Manual refactoring changes with automated refactoring validation , 2014, ICSE.

[17]  Jurriaan Hage,et al.  How do professionals perceive legacy systems and software modernization? , 2014, ICSE.

[18]  Emerson R. Murphy-Hill,et al.  Reconciling manual and automatic refactoring , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[19]  Yinxing Xue Reengineering legacy software products into software product line based on automatic variability analysis , 2011, International Conference on Software Engineering.

[20]  Chih-Han Yu,et al.  Collective decision-making in multi-agent systems by implicit leadership , 2010, AAMAS.

[21]  Jia Liu,et al.  Feature oriented refactoring of legacy applications , 2006, ICSE.

[22]  Y. Smaragdakis,et al.  Binary refactoring: improving code behind the scenes , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[23]  Simon P. Chung,et al.  ARCUS: Symbolic Root Cause Analysis of Exploits in Production Systems , 2021, USENIX Security Symposium.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[26]  Christopher Krügel,et al.  Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware , 2015, NDSS.

[27]  James M. Bieman,et al.  Aspect-Oriented Refactoring of Legacy Applications: An Evaluation , 2012, IEEE Transactions on Software Engineering.

[28]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[29]  G. Manku,et al.  WWW 2007 / Track: Data Mining Session: Similarity Search ABSTRACT Detecting Near-Duplicates for Web Crawling , 2022 .