Build Predictor: More Accurate Missed Dependency Prediction in Build Configuration Files

Software build system (e.g., Make) plays an important role in compiling human-readable source code into an executable program. One feature of build system such as make-based system is that it would use a build configuration file (e.g., Make file) to record the dependencies among different target and source code files. However, sometimes important dependencies would be missed in a build configuration file, which would cause additional debugging effort to fix it. In this paper, we propose a novel algorithm named Build Predictor to mine the missed dependncies. We first analyze dependencies in a build configuration file (e.g., Make file), and establish a dependency graph which captures various dependencies in the build configuration file. Next, considering that a build configuration file is constructed based on the source code dependency relationship, we establish a code dependency graph (code graph). Build Predictor is a composite model, which combines both dependency graph and code graph, to achieve a high prediction performance. We collected 7 build configuration files from various open source projects, which are Zlib, putty, vim, Apache Portable Runtime (APR), memcached, nginx, and Tengine, to evaluate the effectiveness of our algorithm. The experiment results show that compared with the state-of-the-art link prediction algorithms used by Xia et al., our Build Predictor achieves the best performance in predicting the missed dependencies.

[1]  T Epperly,et al.  Software in the DOE: The Hidden Overhead of''The Build'' , 2002 .

[2]  Yang Feng,et al.  Towards more accurate multi-label software behavior learning , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[3]  Peter Smith Software Build Systems: Principles and Experience , 2011 .

[4]  Michael W. Godfrey,et al.  Build system issues in multilanguage software , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[5]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[6]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[7]  Wolfgang De Meuter,et al.  Design recovery and maintenance of build systems , 2007, 2007 IEEE International Conference on Software Maintenance.

[8]  David Lo,et al.  An empirical study of bugs in build process , 2014, SAC.

[9]  Ying Zou,et al.  An empirical study of build system migrations in practice: Case studies on KDE and the Linux kernel , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[10]  Hung Viet Nguyen,et al.  SYMake: a build code analysis and refactoring tool for makefiles , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[11]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[12]  Shane McIntosh,et al.  An empirical study of build maintenance effort , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[13]  David Lo,et al.  Identifying Linux bug fixing patches , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[14]  David Lo,et al.  Tag recommendation in software information sites , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[15]  David Lo,et al.  Build system analysis with link prediction , 2014, SAC.

[16]  Michael W. Godfrey,et al.  The build-time software architecture view , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[17]  Ken-ichi Matsumoto,et al.  Predicting Re-opened Bugs: A Case Study on the Eclipse Project , 2010, 2010 17th Working Conference on Reverse Engineering.

[18]  David Lo,et al.  Automated library recommendation , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[19]  David Lo,et al.  An Empirical Study of Bugs in Software Build Systems , 2013, 2013 13th International Conference on Quality Software.

[20]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[21]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[22]  Hung Viet Nguyen,et al.  Build code analysis with symbolic evaluation , 2012, 2012 34th International Conference on Software Engineering (ICSE).