Ambiguous, informal, and unsound: metaprogramming for naturalness

Program code needs to be understood by both machines and programmers. While the goal of executing programs requires the unambiguity of a formal language, programmers use natural language within these formal constraints to explain implemented concepts to each other. This so called naturalness – the property of programs to resemble human communication – motivated many statistical and machine learning (ML) approaches with the goal to improve software engineering activities. The metaprogramming facilities of most programming environments model the formal elements of a program (meta-objects). If ML is used to support engineering or analysis tasks, complex infrastructure needs to bridge the gap between meta-objects and ML models, changes are not reflected in the ML model, and the mapping from an ML output back into the program’s meta-object domain is laborious. In the scope of this work, we propose to extend metaprogramming facilities to give tool developers access to the representations of program elements within an exchangeable ML model. We demonstrate the usefulness of this abstraction in two case studies on test prioritization and refactoring. We conclude that aligning ML representations with the program’s formal structure lowers the entry barrier to exploit statistical properties in tool development.

[1]  M. Anand “1984” , 1962 .

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Toni Mattis Concept-aware Live Programming: Integrating Topic Models for Program Comprehension into Live Programming Environments , 2017, Programming.

[4]  Oscar Nierstrasz,et al.  Encapsulating and exploiting change with changeboxes , 2007, ICDL '07.

[5]  Ralph E. Johnson,et al.  A Refactoring Tool for Smalltalk , 1997, Theory Pract. Object Syst..

[6]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[7]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[8]  Robert Hirschfeld,et al.  Interleaving of Modification and Use in Data-driven Tool Development , 2014, Onward!.

[9]  Sushil Krishna Bajracharya,et al.  Mining concepts from code with probabilistic topic models , 2007, ASE.

[10]  Jurriaan Hage,et al.  Applications of Multi-view Learning Approaches for Software Comprehension , 2019, Art Sci. Eng. Program..

[11]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[12]  Oscar Nierstrasz,et al.  Context-oriented programming: beyond layers , 2007, ICDL '07.

[13]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[14]  John Maloney,et al.  Back to the Future The Story of Squeak, A Practical Smalltalk Written in Itself , 1997 .

[15]  Gilad Bracha,et al.  Mirrors: design principles for meta-level facilities of object-oriented programming languages , 2004, OOPSLA.

[16]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[17]  Robert Hirschfeld,et al.  Faster feedback through lexical test prioritization , 2019, Programming.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Michele Lanza,et al.  Object-Oriented Metrics in Practice - Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems , 2006 .

[20]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[21]  Andry Rakotonirainy,et al.  Context-oriented programming , 2003, MobiDe '03.