An Empirical Study on the Removal of Self-Admitted Technical Debt

Technical debt refers to the phenomena of taking shortcuts to achieve short term gain at the cost of higher maintenance efforts in the future. Recently, approaches were developed to detect technical debt through code comments, referred to as Self-Admitted Technical Debt (SATD). Due to its importance, several studies have focused on the detection of SATD and examined its impact on software quality. However, preliminary findings showed that in some cases SATD may live in a project for a long time, i.e., more than 10 years. These findings clearly show that not all SATD may be regarded as 'bad' and some SATD needs to be removed, while other SATD may be fine to take on.Therefore, in this paper, we study the removal of SATD. In an empirical study on five open source projects, we examine how much SATD is removed and who removes SATD? We also investigate for how long SATD lives in a project and what activities lead to the removal of SATD? Our findings indicate that the majority of SATD is removed and that the majority is self-removed (i.e., removed by the same person that introduced it). Moreover, we find that SATD can last between approx. 18-172 days, on median. Finally, through a developer survey, we find that developers mostly use SATD to track future bugs and areas of the code that need improvements. Also, developers mostly remove SATD when they are fixing bugs or adding new features. Our findings contribute to the body of empirical evidence on SATD, in particular, evidence pertaining to its removal.

[1]  Timothy C. Lethbridge,et al.  Software Engineering Data Collection for Field Studies , 2008, Guide to Advanced Empirical Software Engineering.

[2]  Christoph Treude,et al.  Who is Who in the Mailing List? Comparing Six Disambiguation Heuristics to Identify Multiple Addresses of a Participant , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[3]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[4]  Ahmed E. Hassan,et al.  Examining the evolution of code comments in PostgreSQL , 2006, MSR '06.

[5]  Osamu Mizuno,et al.  Historage: fine-grained version control system for Java , 2011, IWPSE-EVOL '11.

[6]  Emad Shihab,et al.  An Exploratory Study on Self-Admitted Technical Debt , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[7]  Mário André de Freitas Farias,et al.  A Contextualized Vocabulary Model for identifying technical debt on code comments , 2015, 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD).

[8]  Marcus Ciolkowski,et al.  Conducting on-line surveys in software engineering , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[9]  Ward Cunningham,et al.  The WyCash portfolio management system , 1992, OOPSLA '92.

[10]  K. Goulden,et al.  Effect Sizes for Research: A Broad Practical Approach , 2006 .

[11]  Carolyn B. Seaman,et al.  A Balancing Act: What Software Practitioners Have to Say about Technical Debt , 2012, IEEE Softw..

[12]  Gregorio Robles,et al.  Developer Turnover in Global, Industrial Open Source Projects: Insights from Applying Survival Analysis , 2017, 2017 IEEE 12th International Conference on Global Software Engineering (ICGSE).

[13]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[14]  Emad Shihab,et al.  Examining the Impact of Self-Admitted Technical Debt on Software Quality , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[15]  Tom Mens,et al.  Towards a survival analysis of database framework usage in Java projects , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[16]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[17]  Alexander Serebrenik,et al.  Eclipse API usage: the good and the bad , 2013, Software Quality Journal.

[18]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[19]  Neil A. Ernst,et al.  Measure it? Manage it? Ignore it? software practitioners and technical debt , 2015, ESEC/SIGSOFT FSE.

[20]  Gabriele Bavota,et al.  A Large-Scale Empirical Study on Self-Admitted Technical Debt , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[21]  Francesca Arcelli Fontana,et al.  Investigating the impact of code smells debt on quality code evaluation , 2012, 2012 Third International Workshop on Managing Technical Debt (MTD).

[22]  Robert L. Nord,et al.  Technical debt: towards a crisper definition report on the 4th international workshop on managing technical debt , 2013, SOEN.

[23]  Naoyasu Ubayashi,et al.  Using Analytics to Quantify Interest of Self-Admitted Technical Debt , 2016, QuASoQ/TDA@APSEC.

[24]  Christine Nadel,et al.  Case Study Research Design And Methods , 2016 .

[25]  Peri L. Tarr,et al.  An enterprise perspective on technical debt , 2011, MTD '11.

[26]  Jonathan I. Maletic,et al.  srcML: An Infrastructure for the Exploration, Analysis, and Manipulation of Source Code: A Tool Demonstration , 2013, 2013 IEEE International Conference on Software Maintenance.

[27]  Forrest Shull,et al.  Investigating the impact of design debt on software quality , 2011, MTD '11.

[28]  Forrest Shull,et al.  A case study on effectively identifying technical debt , 2013, EASE '13.

[29]  Forrest Shull,et al.  Investigating technical debt folklore: Shedding some light on technical debt opinion , 2013, 2013 4th International Workshop on Managing Technical Debt (MTD).

[30]  Daniela E. Damian,et al.  Selecting Empirical Methods for Software Engineering Research , 2008, Guide to Advanced Empirical Software Engineering.

[31]  Carolyn B. Seaman,et al.  Measuring and Monitoring Technical Debt , 2011, Adv. Comput..

[32]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[33]  Robert L. Nord,et al.  Managing technical debt in software-reliant systems , 2010, FoSER '10.

[34]  Emad Shihab,et al.  Detecting and quantifying different types of self-admitted technical Debt , 2015, 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD).

[35]  Janice Singer,et al.  TODO or to bug , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[36]  Alexander Serebrenik,et al.  Who's who in Gnome: Using LSA to merge software repository identities , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[37]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[38]  Ioannis Stamelos,et al.  Survival analysis on the duration of open source projects , 2010, Inf. Softw. Technol..

[39]  Nikolaos Tsantalis,et al.  Using Natural Language Processing to Automatically Detect Self-Admitted Technical Debt , 2017, IEEE Transactions on Software Engineering.

[40]  Janice Singer,et al.  Ethical Issues in Empirical Studies of Software Engineering , 2002, IEEE Trans. Software Eng..