A Program Extractor Suite for C and C + + : Choosing the Right Tool for the Job

This report describes a suite of program extractors which we developed by adopting and extending existing program parsing and extraction techniques or tools. This suite is called CX because it is mainly targeted at extracting facts from C and C++ programs. This suite is currently composed of four extractors: CPPX, BFX, LDX and CTSX. The main goal of creating CX is to provide a convenient set of program extractors that can complement each other and work in a systematic manner. The benefits of this extractor suite will be discussed in terms of two practical applications: (1) creating program comprehension pipelines to support various understanding tasks, and (2) building an open source software evolution database (EvolDB) to support empirical research on software evolution.

[1]  Georg Sander VCG - visualization of compiler graphs , 1995 .

[2]  Christian S. Collberg,et al.  A system for graph-based visualization of the evolution of software , 2003, SoftVis '03.

[3]  Richard C. Holt,et al.  Completeness of a fact extractor , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[4]  Tibor Gyimóthy,et al.  A short introduction to Columbus/CAN , 2001 .

[5]  David Notkin,et al.  Software reflexion models: bridging the gap between source and high-level models , 1995, SIGSOFT FSE.

[6]  Hausi A. Müller,et al.  A reverse-engineering approach to subsystem structure identification , 1993, J. Softw. Maintenance Res. Pract..

[7]  Bruno Laguë,et al.  DATRIX Abstract Semantic Graph Reference Manual , 1999 .

[8]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[9]  Richard C. Holt,et al.  Linker-Based Program Extraction and Its Uses in Studying Software Evolution , 2004 .

[10]  David Notkin,et al.  Lightweight lexical source model extraction , 1996, TSEM.

[11]  Tibor Gyimóthy,et al.  Extracting facts from open source software , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[12]  Michael W. Godfrey,et al.  Detecting merging and splitting using origin analysis , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[13]  Susan Elliott Sim,et al.  On using a benchmark to evaluate C++ extractors , 2002, Proceedings 10th International Workshop on Program Comprehension.

[14]  Michael W. Godfrey,et al.  A reference architecture for Web browsers , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[15]  Charles L. A. Clarke,et al.  Tokenizer Matcher Action Dispatcher Rule SetOUTPUT Controller Iteration INPUT Iteration Model Figure 1 . Overview : Iterative Lexical Analysis , 2003 .