Effective pattern matching of source code using abstract syntax patterns

Program understanding can be assisted by tools that match patterns in the program source. Lexical pattern matchers provide excellent performance and ease of use, but have a limited vocabulary. Syntactic matchers provide more precision, but may sacrifice performance, robustness, or power. To achieve more of the benefits of both models, we extend the pattern syntax of AWK to support matching of abstract syntax trees, as demonstrated in a tool called TAWK. Its pattern syntax is language-independent, based on abstract tree patterns. As in AWK, patterns can have associated actions, which in TAWK are written in C for generality, familiarity, and performance. The use of C is simplified by high-level libraries and dynamic linking. To allow processing of program files containing non-syntactic constructs such as textual macros, mechanisms have been designed that allow matching of ‘language-like’ macros in a syntactic fashion. We survey and apply prototypical approaches to concretely demonstrate the tradeoffs in program processing. Our results indicate that TAWK can be used to quickly and easily perform a variety of common software engineering tasks, and the extensions to accommodate non-syntactic features significantly extend the generality of syntactic matchers. Copyright © 2005 John Wiley & Sons, Ltd.

[1]  Premkumar T. Devanbu GENOA - A Customizable, Language- And Front-end Independent Code Analyzer , 1992, International Conference on Software Engineering.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Gordon Kotik,et al.  A program transformation approach to automating software re-engineering , 1990, Proceedings., Fourteenth Annual International Computer Software and Applications Conference.

[4]  Premkumar T. Devanbu,et al.  Generating testing and analysis tools with Aria , 1996, TSEM.

[5]  Daniel Weise,et al.  Programmable syntax macros , 1993, PLDI '93.

[6]  Alfred V. Aho,et al.  Pattern Matching in Strings , 1980 .

[7]  Atul Prakash,et al.  A Framework for Source Code Search Using Program Patterns , 1994, IEEE Trans. Software Eng..

[8]  Charles B. Haley,et al.  Practical LR error recovery , 1979, SIGPLAN '79.

[9]  William G. Griswold,et al.  Managing design trade-offs for a program understanding and transformation tool , 1995, J. Syst. Softw..

[10]  David Notkin,et al.  Lightweight lexical source model extraction , 1996, TSEM.

[11]  J. Christopher Ramming,et al.  A*: A Language for Implementing Language Processors , 1995, IEEE Trans. Software Eng..

[12]  William G. Griswold,et al.  Implementation techniques for efficient data-flow analysis of large programs , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[13]  Guy L. Steele,et al.  Common Lisp the Language , 1984 .

[14]  David Notkin,et al.  An empirical study of static call graph extractors , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[15]  Alfred V. Aho,et al.  Awk — a pattern scanning and processing language , 1979, Softw. Pract. Exp..

[16]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[17]  Meir M. Lehman,et al.  Program evolution: processes of software change , 1985 .

[18]  William G. Griswold,et al.  The design of whole-program analysis tools , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.