A Language-Parametric Modular Framework for Mining Idiomatic Code Patterns

In an ongoing industry-university collaboration we are developing a language-parametric framework for mining code idioms in legacy systems. This modular framework has a pipeline architecture and a languageparametric meta representation of the artefacts used by each of its 5 components: source code importer, mining preprocessor, pattern miner, pattern matcher, and modernisation assistant. The pipeline enables reuse of its components across systems and languages, as well as for project partners to work on each of these components separately. An example is the exploration of novel pattern mining techniques independently of the languages on which they will be applied and the modernisation assistant in which they will be used. Our first results on mining Java and COBOL code are promising, even though challenges still lie ahead to make the framework and its constituting components truly scalable, customisable, and language independent.

[1]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[2]  Bing Wu,et al.  Legacy Information Systems: Issues and Directions , 1999, IEEE Softw..

[3]  Vadim Zaytsev,et al.  Raincode assembler compiler (tool demo) , 2016, SLE.

[4]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[5]  Vadim Zaytsev,et al.  Language design and implementation for the domain of coding conventions , 2016, SLE.

[6]  Zoran Budimac,et al.  Language independent framework for static code analysis , 2013, BCI '13.

[7]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[8]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[9]  José Nelson Amaral,et al.  Syntax errors just aren't natural: improving error reporting with language models , 2014, MSR 2014.

[10]  Coen De Roover,et al.  Automated Generalization and Refinement of Code Templates with Ekeko/X , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[11]  Katsuro Inoue,et al.  The Ekeko/X Program Transformation Tool , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[12]  Ralf Lämmel,et al.  Multi-dimensional exploration of API usage , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[13]  Vadim Zaytsev,et al.  Open challenges in incremental coverage of legacy software languages , 2017, PX@SPLASH.

[14]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[15]  Niels P. Veerman Revitalizing modifiability of legacy assets , 2004, J. Softw. Maintenance Res. Pract..

[16]  Harry M. Sneed,et al.  Restructuring of COBOL/CICS legacy systems , 2002, Sci. Comput. Program..

[17]  Charles A. Sutton,et al.  Suggesting accurate method and class names , 2015, ESEC/SIGSOFT FSE.

[18]  Charles A. Sutton,et al.  Mining idioms from source code , 2014, SIGSOFT FSE.

[19]  Oscar Nierstrasz,et al.  A meta-model for language-independent refactoring , 2000, Proceedings International Symposium on Principles of Software Evolution.

[20]  Premkumar T. Devanbu,et al.  On the "naturalness" of buggy code , 2015, ICSE.

[21]  Keith H. Bennett,et al.  Legacy Systems: Coping with Success , 1995, IEEE Softw..

[22]  Jurriaan Hage,et al.  How do professionals perceive legacy systems and software modernization? , 2014, ICSE.

[23]  Andrzej Wasowski,et al.  Experiences from Designing and Validating a Software Modernization Transformation (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Gabriele Bavota,et al.  On the Uniqueness of Code Redundancies , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[25]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[26]  Stéphane Ducasse,et al.  Meta-environment and executable meta-language using smalltalk: an experience report , 2009, Software & Systems Modeling.

[27]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.