Software-Architecture Recovery from Machine Code ∗

In this paper, we present a tool, called Lego, which recovers object-oriented software architecture from stripped binaries. Lego takes a stripped binary as input, and uses information obtained from dynamic analysis to (i) group the functions in the binary into classes, and (ii) identify inheritance and composition relationships between the inferred classes. The information obtained by Lego can be used for reengineering legacy software, and for understanding the architecture of software systems that lack documentation and source code. Our experiments show that the class hierarchies recovered by Lego have a high degree of agreement—measured in terms of precision and recall—with the hierarchy defined in the source code.

[1]  David Brumley,et al.  TIE: Principled Reverse Engineering of Types in Binary Programs , 2011, NDSS.

[2]  Oscar Nierstrasz,et al.  Object-oriented reengineering patterns , 2004, Proceedings. 26th International Conference on Software Engineering.

[3]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[4]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[5]  David Notkin,et al.  Lightweight lexical source model extraction , 1996, TSEM.

[6]  Herbert Bos,et al.  Howard: A Dynamic Excavator for Reverse Engineering Data Structures , 2011, NDSS.

[7]  Rick Kazman,et al.  Playing Detective: Reconstructing Software Architecture from Available Evidence , 1999, Automated Software Engineering.

[8]  Melissa P. Chase,et al.  Manipulating Recovered Software Architecture Views , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[9]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[10]  Gregor Snelting,et al.  Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[11]  Hong Yan,et al.  DiscoTect: a system for discovering architectures from running systems , 2004, Proceedings. 26th International Conference on Software Engineering.

[12]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[13]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[14]  Samuel T. King,et al.  Digging for Data Structures , 2008, OSDI.

[15]  Thomas W. Reps,et al.  Identifying modules via concept analysis , 1997, 1997 Proceedings International Conference on Software Maintenance.

[16]  Paolo Tonella,et al.  Concept Analysis for Module Restructuring , 2001, IEEE Trans. Software Eng..

[17]  Alexander S. Yeh,et al.  Reverse Engineering to the Architectural Level , 1995, 1995 17th International Conference on Software Engineering.

[18]  René L. Krikhaar,et al.  Reverse architecting approach for complex systems , 1997, 1997 Proceedings International Conference on Software Maintenance.

[19]  Dusan M. Velasevic,et al.  A use-case driven method of architecture recovery for program understanding and reuse reengineering , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[20]  Rick Kazman,et al.  View extraction and view fusion in architectural understanding , 1998, Proceedings. Fifth International Conference on Software Reuse (Cat. No.98TB100203).

[21]  Barton P. Miller,et al.  Labeling library functions in stripped binaries , 2011, PASTE '11.