Automated Decomposition of Build Targets (Extended Version)

A (build) target specifies the information that is needed to automatically build a software artifact. This paper focuses on underutilized targets—an important dependency problem that we identified at Google. An underutilized target is one with files not needed by some of its dependents. Underutilized targets result in less modular code, overly large artifacts, slow builds, and unnecessary build and test triggers. To mitigate these problems, programmers decompose underutilized targets into smaller targets. However, manually decomposing a target is tedious and error-prone. Although we prove that finding the best target decomposition is NP-hard, we introduce a greedy algorithm that proposes a decomposition through iterative unification of the strongly connected components of the target. Our tool found that 19,994 of 40,000 Java library targets at Google can be decomposed to at least two targets. The results show that our tool is (1) efficient because it analyzes a target in two minutes on average and (2) effective because for each of 1,010 targets, it would save at least 50% of the total execution time of the tests triggered by the target.

[1]  Gabriele Bavota,et al.  Software Re-Modularization Based on Structural and Semantic Metrics , 2010, 2010 17th Working Conference on Reverse Engineering.

[2]  Mary Lou Soffa,et al.  An incremental approach to unit testing during maintenance , 1988, Proceedings. Conference on Software Maintenance, 1988..

[3]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[4]  Richard C. Holt,et al.  ACCD: an algorithm for comprehension-driven clustering , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[5]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[6]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[7]  Gregg Rothermel,et al.  An empirical study of regression test selection techniques , 2001, ACM Trans. Softw. Eng. Methodol..

[8]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[9]  Alessandro Orso,et al.  Regression test selection for Java software , 2001, OOPSLA '01.

[10]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[11]  Shane McIntosh,et al.  The evolution of Java build systems , 2012, Empirical Software Engineering.

[12]  Gregg Rothermel,et al.  Analyzing Regression Test Selection Techniques , 1996, IEEE Trans. Software Eng..

[13]  Xin Yao,et al.  Software Module Clustering as a Multi-Objective Search Problem , 2011, IEEE Transactions on Software Engineering.

[14]  Richard C. Holt,et al.  Comparison of clustering algorithms in the context of software evolution , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[15]  Andrew Glover,et al.  Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) , 2007 .

[16]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[17]  Shane McIntosh,et al.  An empirical study of build maintenance effort , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[18]  Laurie A. Williams,et al.  Applying regression test selection for COTS-based applications , 2006, ICSE.

[19]  J. David Morgenthaler,et al.  Automated Decomposition of Build Targets , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[20]  Houari A. Sahraoui,et al.  Towards automatically improving package structure while respecting original design decisions , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[21]  William F. Opdyke,et al.  Refactoring object-oriented frameworks , 1992 .

[22]  Alessandro Orso,et al.  Scaling regression testing to large software systems , 2004, SIGSOFT '04/FSE-12.

[23]  Sarfraz Khurshid,et al.  SPLat: lightweight dynamic analysis for reducing combinatorics in testing configurable systems , 2013, ESEC/FSE 2013.

[24]  Gabriele Bavota,et al.  Putting the Developer in-the-Loop: An Interactive GA for Software Re-modularization , 2012, SSBSE.

[25]  Gregg Rothermel,et al.  Empirical Studies of a Safe Regression Test Selection Technique , 1998, IEEE Trans. Software Eng..

[26]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[27]  Wolfgang De Meuter,et al.  Design recovery and maintenance of build systems , 2007, 2007 IEEE International Conference on Software Maintenance.

[28]  Wolfram Schulte,et al.  Taking Control of Your Engineering Tools , 2013, Computer.

[29]  J. David Morgenthaler,et al.  Searching for build debt: Experiences managing technical debt at Google , 2012, 2012 Third International Workshop on Managing Technical Debt (MTD).

[30]  Lucian Voinea,et al.  A Tool for Optimizing the Build Performance of Large Software Code Bases , 2008, 2008 12th European Conference on Software Maintenance and Reengineering.

[31]  Hung Viet Nguyen,et al.  Build code analysis with symbolic evaluation , 2012, 2012 34th International Conference on Software Engineering (ICSE).