Characterizing and Mitigating Self-Admitted Build Debt

Technical Debt is a metaphor used to describe the situation in which long-term code quality is traded for short-term goals in software projects. In recent years, the concept of self-admitted technical debt (SATD) was proposed, which focuses on debt that is intentionally introduced and described by developers. Although prior work has made important observations about admitted technical debt in source code, little is known about SATD in build systems. In this paper, we coin the term Self-Admitted Build Debt (SABD) and through a qualitative analysis of 500 SABD comments in the Maven build system of 300 projects, we characterize SABD by location and rationale (reason and purpose). Our results show that limitations in tools and libraries, and complexities of dependency management are the most frequent causes, accounting for 49% and 23% of the comments. We also find that developers often document SABD as issues to be fixed later. To automate the detection of SABD rationale, we train classifiers to label comments according to the surrounding document content. The classifier performance is promising, achieving an F1-score of 0.67–0.75. Finally, within 16 identified ‘ready-to-be-addressed’ SABD instances, the three SABD submitted by pull requests and the five SABD submitted by issue reports were resolved after developers were made aware. Our work presents the first step towards understanding technical debt in build systems and opens up avenues for future work, such as tool support to track and manage SABD backlogs.

[1]  Shane McIntosh,et al.  Forecasting the Duration of Incremental Build Jobs , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[2]  Shane McIntosh,et al.  An empirical study of build maintenance effort , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[3]  Minhaz Fahim Zibran,et al.  Insights into Continuous Integration Build Failures , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[4]  Foyzul Hassan,et al.  Tackling Build Failures in Continuous Integration , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Helwig Hauser,et al.  Parallel Sets: interactive exploration and visual analysis of categorical data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[6]  Shane McIntosh,et al.  Automatically repairing dependency-related build breakage , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[7]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[8]  Peng Liang,et al.  A systematic mapping study on technical debt and its management , 2015, J. Syst. Softw..

[9]  T Epperly,et al.  Software in the DOE: The Hidden Overhead of''The Build'' , 2002 .

[10]  Christoph Treude,et al.  9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[11]  Wolfgang De Meuter,et al.  The Evolution of the Linux Build System , 2007, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[12]  Shane McIntosh,et al.  The review linkage graph for code review analytics: a recovery approach and empirical study , 2019, ESEC/SIGSOFT FSE.

[13]  Alexander Serebrenik,et al.  An Empirical Study on the Removal of Self-Admitted Technical Debt , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[14]  Rodrigo O. Spínola,et al.  Towards an Ontology of Terms on Technical Debt , 2014, 2014 Sixth International Workshop on Managing Technical Debt.

[15]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[16]  Yann-Gaël Guéhéneuc,et al.  Do Not Trust Build Results at Face Value - An Empirical Study of 30 Million CPAN Builds , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[17]  Jacky W. Keung,et al.  On the value of a prioritization scheme for resolving Self-admitted technical debt , 2018, J. Syst. Softw..

[18]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[19]  Mário André de Freitas Farias,et al.  Identifying self-admitted technical debt through code comment analysis with a contextualized vocabulary , 2020, Inf. Softw. Technol..

[20]  Michael J. Albers,et al.  Book Review: Information Architecture for the World Wide Web: Designing Large-Scale Web Sites , 2000 .

[21]  David Lo,et al.  Identifying self-admitted technical debt in open source projects using text mining , 2017, Empirical Software Engineering.

[22]  Foyzul Hassan,et al.  HireBuild: An Automatic Approach to History-Driven Repair of Build Scripts , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[23]  Olga Baysal,et al.  Built to Last or Built Too Fast? Evaluating Prediction Models for Build Times , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[24]  Philipp Leitner,et al.  An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[25]  Audris Mockus,et al.  A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance , 2014, Empirical Software Engineering.

[26]  Ahmed E. Hassan,et al.  Using Decision Trees to Predict the Certification Result of a Build , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[27]  Shane McIntosh,et al.  The evolution of Java build systems , 2012, Empirical Software Engineering.

[28]  Hitesh Sajnani,et al.  Towards Predicting the Impact of Software Changes on Building Activities , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER).

[29]  J. David Morgenthaler,et al.  Searching for build debt: Experiences managing technical debt at Google , 2012, 2012 Third International Workshop on Managing Technical Debt (MTD).

[30]  Zhenchang Xing,et al.  Neural Network-based Detection of Self-Admitted Technical Debt: From Performance to Explainability , 2019, ACM Trans. Softw. Eng. Methodol..

[31]  Shojiro Nishio,et al.  IDF for Word N-grams , 2017, ACM Trans. Inf. Syst..

[32]  Dan Klein,et al.  Optimization, Maxent Models, and Conditional Estimation without Magic , 2003, NAACL.

[33]  Gabriele Bavota,et al.  Automated Identification of On-hold Self-admitted Technical Debt , 2020, 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[34]  Kelly Blincoe,et al.  Embracing Technical Debt, from a Startup Company Perspective , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[35]  David Lo,et al.  SATD Detector: A Text-Mining-Based Self-Admitted Technical Debt Detection Tool , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[36]  Christoph Treude,et al.  Wait for it: identifying “On-Hold” self-admitted technical debt , 2020, Empirical Software Engineering.

[37]  Andy Zaidman,et al.  Continuous Delivery Practices in a Large Financial Organization , 2016, ICSME.

[38]  David Lo,et al.  Automating Change-Level Self-Admitted Technical Debt Determination , 2019, IEEE Transactions on Software Engineering.

[39]  Ward Cunningham,et al.  The WyCash portfolio management system , 1992, OOPSLA '92.

[40]  Foutse Khomh,et al.  Why Do Automated Builds Break? An Empirical Study , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[41]  Nikolaos Tsantalis,et al.  Using Natural Language Processing to Automatically Detect Self-Admitted Technical Debt , 2017, IEEE Transactions on Software Engineering.

[42]  Hideaki Hata,et al.  Identifying Design and Requirement Self-Admitted Technical Debt Using N-gram IDF , 2018, 2018 9th International Workshop on Empirical Software Engineering in Practice (IWESEP).

[43]  Terese Besker,et al.  Software developer productivity loss due to technical debt - A replication and extension study examining developers' development work , 2019, J. Syst. Softw..