To design and implement a program, programmers choose analogies and metaphors to explain and understand programmatic concepts. In source code, they manifest themselves as a particular choice of names. During program comprehension, reading such names is an important starting point to understand the meaning of modules and guide the exploration process. On the one hand, understanding a program in depth by looking for names that suggest a particular analogy can be a time-consuming process. On the other hand, a lack of awareness which concepts are present and which analogies have been chosen can lead to modularity issues, such as redundancy and architectural drift if concepts are misaligned with respect to the current module decomposition. In this work-in-progress paper, we propose to integrate first-class concepts into the programming environment. We assign meaning to names by labeling them with a color corresponding to the metaphor or analogy this name was derived from. We hypothesize that aggregating labels upwards along the module hierarchy helps to understand how concepts are distributed across the program, collecting names belonging to a specific concept helps programmers to recognize which metaphor has been chosen, and presenting relations between concepts can summarize complex interactions between program parts. We argue that continuous feedback and awareness of how names are grouped into concepts and where they are located can help preventing modularity issues and ease program comprehension. As a first step towards an implementation, we define criteria that help to detect names belonging to the same concept. We then investigate how techniques from natural language processing can be re-used and modified to compute an initial concept allocation with respect to these criteria. Eventually, we show design sketches how we plan to arrange and present concepts to programmers through tools, and what kind of information they can provide to help programmers make informed implementation decisions.
[1]
David W. Binkley,et al.
Understanding LDA in source code analysis
,
2014,
ICPC 2014.
[2]
Jurriaan Hage,et al.
ITMViz: Interactive Topic Modeling for Source Code Analysis
,
2015,
2015 IEEE 23rd International Conference on Program Comprehension.
[3]
Peter Naur,et al.
Programming as theory building
,
1985
.
[4]
Ted J. Biggerstaff,et al.
The concept assignment problem in program understanding
,
1993,
[1993] Proceedings Working Conference on Reverse Engineering.
[5]
Sushil Krishna Bajracharya,et al.
Mining concepts from code with probabilistic topic models
,
2007,
ASE.
[6]
Michael I. Jordan,et al.
Hierarchical Dirichlet Processes
,
2006
.
[7]
Alvaro Videla.
Metaphors We Compute By
,
2017,
ACM Queue.
[8]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[9]
John Maloney,et al.
Back to the Future The Story of Squeak, A Practical Smalltalk Written in Itself
,
1997
.
[10]
Gerald J. Sussman,et al.
Structure and Interpretation of Computer Programs, Second Edition
,
1996
.
[11]
Richard N. Taylor,et al.
Software traceability with topic modeling
,
2010,
2010 ACM/IEEE 32nd International Conference on Software Engineering.
[12]
Edoardo M. Airoldi,et al.
Mixed Membership Stochastic Blockmodels
,
2007,
NIPS.
[13]
Hong Gu,et al.
BiomeNet: A Bayesian Model for Inference of Metabolic Divergence among Microbial Communities
,
2014,
PLoS Comput. Biol..
[14]
Ahmed E. Hassan,et al.
Modeling the evolution of topics in source code histories
,
2011,
MSR '11.
[15]
Xiaohui Yan,et al.
A biterm topic model for short texts
,
2013,
WWW.