Using the GPGPU for scaling up Mining Software Repositories

The Mining Software Repositories (MSR) field integrates and analyzes data stored in repositories such as source control and bug repositories to support practitioners. Given the abundance of repository data, scaling up MSR analyses has become a major challenge. Recently, researchers have experimented with conventional techniques like a supercomputer or cloud computing, but these are either too expensive or too hard to configure. This paper proposes to scale up MSR analysis using “general-purpose computing on graphics processing units” (GPGPU) on off-the-shelf video cards. In a representative MSR case study to measure co-change on version history of the Eclipse project, we find that the GPU approach is up to a factor of 43.9 faster than a CPU-only approach.

[1]  Ahmed E. Hassan,et al.  MapReduce as a general framework to support research in Mining Software Repositories (MSR) , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[2]  Audris Mockus,et al.  Amassing and indexing a large sample of version control systems: Towards the census of public source code history , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[3]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[4]  S.A. Manavski,et al.  CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[5]  Cole Trapnell,et al.  Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..

[6]  Ahmed E. Hassan,et al.  Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report , 2012, J. Syst. Softw..

[7]  Audris Mockus,et al.  High-impact defects: a study of breakage and surprise defects , 2011, ESEC/FSE '11.