Multi-Perspective Exploratory Analysis of Software Development Data

In this paper, we present Dominoes, an approach for analyzing software repositories with thousands of artifacts by considering multiple perspectives of the software development data. In order to achieve computational power we model the data and its relationships as matrices, making possible to efficiently process them with a GPUs (Graphics Processing Unit) based architectures. Dominoes can support automated exploration of different relationships among project artifacts, where users have the flexibility to interactively combine and compose them. Our solution organizes data extracted from software repositories into multiple matrices that can be treated as domino pieces (e.g. [commit|method]). The connection of such pieces corresponds to a set of matrices operations, which derive additional domino pieces. These derived domino pieces represent specific project entity relationships (e.g. number of commits in which two methods co-occurred) and can be used for further explorations. As an evaluation of the Dominoes framework we present two exploratory case studies based on Apache Derby. First, we use Dominoes to show how dependencies among artifacts can be derived. Then, we identify expertise of developers by considering the commits that developers make to artifacts. We show that identifying relationships among 34,335 elements along 7,578 commits takes about 0.2 minutes in GPU, while the same processing in CPU takes about 413 minutes. Besides, identifying expertise of developer on a set of 34,335 files and 36 developers takes about 0.1 minute in GPU, whereas in CPU it takes 324 minutes.

[1]  Sanguthevar Rajasekaran,et al.  Multicore Computing: Algorithms, Architectures, and Applications , 2013 .

[2]  Andrew Begel,et al.  Codebook: discovering and exploiting relationships in software repositories , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Esteban Walter Gonzalez Clua,et al.  Exploratory Data Analysis of Software Repositories via GPU Processing , 2014, SEKE.

[4]  Gail C. Murphy,et al.  Recommending Emergent Teams , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[5]  Harald C. Gall,et al.  Change Analysis with Evolizer and ChangeDistiller , 2009, IEEE Software.

[6]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[7]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[8]  Thomas Fritz,et al.  Using information fragments to answer the questions developers ask , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[9]  Esteban Walter Gonzalez Clua,et al.  A GPU-based Architecture for Parallel Image-aware Version Control , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[10]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[11]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[12]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[13]  Denys Poshyvanyk,et al.  Who can help me with this change request? , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[14]  Wen-Yang Lin,et al.  A Confidence-Lift Support Specification for Interesting Associations Mining , 2002, PAKDD.

[15]  D. V. Steward,et al.  The design structure system: A method for managing the design of complex systems , 1981, IEEE Transactions on Engineering Management.

[16]  Mark S. Ackerman,et al.  Expertise recommender: a flexible recommendation system and architecture , 2000, CSCW '00.

[17]  Anita Sarma,et al.  Tesseract: Interactive visual exploration of socio-technical relationships in software development , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  James D. Herbsleb,et al.  Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity , 2008, ESEM '08.