[Journal First] Predicting Future Developer Behavior in the IDE Using Topic Models

Interaction data, gathered from developers' daily clicks and key presses in the IDE, has found use in both empirical studies and in recommendation systems for software engineering. We observe that this data has several characteristics, common across IDEs: 1) exponentially distributed - some events or commands dominate the trace (e.g., cursor movement commands), while most other commands occur relatively infrequently; 2) noisy - the traces include spurious commands (or clicks), or unrelated events, that may not be important to the behavior of interest; 3) comprise of overlapping events and commands - specific commands can be invoked by separate mechanisms, and similar events can be triggered by different sources. These characteristics of this data are analogous to the characteristics of synonymy and polysemy in natural language corpora. Therefore, this paper (and presentation) presents a new modeling approach for this type of data, leveraging topic models typically applied to streams of natural language text.

[1]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[2]  Camille Salinesi,et al.  Unsupervised discovery of intentional process models from event logs , 2014, MSR 2014.

[3]  Emerson Murphy-Hill,et al.  Improving software developers' fluency by recommending development environment commands , 2012, SIGSOFT FSE.

[4]  Stas Negara,et al.  Mining fine-grained code changes to detect unknown change patterns , 2014, ICSE.

[5]  Gilbert Hamann,et al.  Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper) , 2008, 2008 The Eighth International Conference on Quality Software.

[6]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[7]  Thomas Fritz,et al.  Collecting and Processing Interaction Data for Recommendation Systems , 2014, Recommendation Systems in Software Engineering.

[8]  Mik Kersten,et al.  Mylar: a degree-of-interest model for IDEs , 2005, AOSD '05.

[9]  Lori L. Pollock,et al.  A field study of how developers locate features in source code , 2016, Empirical Software Engineering.

[10]  Sarah Nadi,et al.  FeedBaG: An interaction tracker for Visual Studio , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[11]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[12]  Mik Kersten,et al.  How are Java software developers using the Elipse IDE? , 2006, IEEE Software.

[13]  Wil M. P. van der Aalst,et al.  Process Mining - Discovery, Conformance and Enhancement of Business Processes , 2011 .

[14]  Sarah Smith Heckman,et al.  Bespoke tools: adapted to the concepts developers know , 2015, ESEC/SIGSOFT FSE.

[15]  Michele Lanza,et al.  I know what you did last summer: an investigation of how developers spend their time , 2015, ICPC '15.

[16]  Martin P. Robillard,et al.  The Influence of the Task on Programmer Behaviour , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[17]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[18]  Ralph E. Johnson,et al.  Alternate refactoring paths reveal usability problems , 2014, ICSE.

[19]  Andrew P. Black,et al.  How We Refactor, and How We Know It , 2012, IEEE Trans. Software Eng..

[20]  Tovi Grossman,et al.  CommunityCommands: command recommendations for software applications , 2009, UIST '09.

[21]  Johannes Schneider,et al.  Mining Sequences of Developer Interactions in Visual Studio for Usage Smells , 2017, IEEE Transactions on Software Engineering.

[22]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Andrei Chis,et al.  Towards moldable development tools , 2015, PLATEAU@SPLASH.

[24]  Brian P. Bailey,et al.  Understanding and developing models for detecting and differentiating breakpoints during interactive tasks , 2007, CHI.

[25]  Foutse Khomh,et al.  Noises in Interaction Traces Data and Their Impact on Previous Research Studies , 2015, 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[26]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[27]  Lori L. Pollock,et al.  Interactive Exploration of Developer Interaction Traces using a Hidden Markov Model , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Brad A. Myers,et al.  Capturing and analyzing low-level events from the code editor , 2011, PLATEAU '11.

[30]  Michael Kölling,et al.  The BlueJ System and its Pedagogy , 2003, Comput. Sci. Educ..

[31]  Tovi Grossman,et al.  Deploying CommunityCommands: A Software Command Recommender System Case Study , 2014, AI Mag..