Fast, flexible syntactic pattern matching and processing

Program understanding can be assisted by tools that match patterns in the program source. Lexical pattern matchers provide excellent performance and ease of use, but have a limited vocabulary. Syntactic matchers provide more precision, but may sacrifice performance, retargetability, ease of use, or generality. To achieve more of the benefits of both models, we extend the pattern syntax of AWK to support matching of abstract syntax trees, as demonstrated in a tool called TAWK. Its pattern syntax is language-independent, based on abstract tree patterns. As in AWK, patterns can have associated actions, which in TAWK are written in C for generality, familiarity, and performance. The use of C is simplified by high-level libraries and dynamic linking. To allow processing of program files containing non-syntactic constructs, mechanisms have been designed that allow transparent matching in a syntactic fashion. So far TAWK has been retargeted to the MUMPS and C programming languages. We survey and apply prototypical approaches to concretely demonstrate the tradeoffs. Our results indicate that TAWK can be used to quickly and easily perform a variety of common software engineering tasks, and the extensions to accommodate non-syntactic features significantly extend the generality of syntactic matchers.

[1]  Guy L. Steele,et al.  Common Lisp the Language , 1984 .

[2]  Premkumar T. Devanbu GENOA - A Customizable, Language- And Front-end Independent Code Analyzer , 1992, International Conference on Software Engineering.

[3]  David Notkin,et al.  Lightweight source model extraction , 1995, SIGSOFT '95.

[4]  Daniel Weise,et al.  Programmable syntax macros , 1993, PLDI '93.

[5]  Alfred V. Aho,et al.  Pattern Matching in Strings , 1980 .

[6]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[7]  William G. Griswold,et al.  Managing design trade-offs for a program understanding and transformation tool , 1995, J. Syst. Softw..

[8]  Gordon Kotik,et al.  A program transformation approach to automating software re-engineering , 1990, Proceedings., Fourteenth Annual International Computer Software and Applications Conference.

[9]  Charles B. Haley,et al.  Practical LR error recovery , 1979, SIGPLAN '79.

[10]  Alfred V. Aho,et al.  Awk — a pattern scanning and processing language , 1979, Softw. Pract. Exp..

[11]  Ralph E. Griswold,et al.  The Icon programming language , 1983 .

[12]  J. Christopher Ramming,et al.  A*: a language for implementing language processors , 1994, Proceedings of 1994 IEEE International Conference on Computer Languages (ICCL'94).

[13]  William G. Griswold,et al.  The design of whole-program analysis tools , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[14]  Atul Prakash,et al.  A Framework for Source Code Search Using Program Patterns , 1994, IEEE Trans. Software Eng..