On the Lack of Consensus Among Technical Debt Detection Tools

A vigorous and growing set of technical debt analysis tools have been developed in recent years-both research tools and industrial products-such as Structure 101, SonarQube, and DV8. Each of these tools identifies problematic files using their own definitions and measures. But to what extent do these tools agree with each other in terms of the files that they identify as problematic? If the top-ranked files reported by these tools are largely consistent, then we can be confident in using any of these tools. Otherwise, a problem of accuracy arises. In this paper, we report the results of an empirical study analyzing 10 projects using multiple tools. Our results show that: 1) these tools report very different results even for the most common measures, such as size, complexity, file cycles, and package cycles. 2) These tools also differ dramatically in terms of the set of problematic files they identify, since each implements its own definitions of "problematic". After normalizing by size, the most problematic file sets that the tools identify barely overlap. 3) Our results show that code-based measures, other than size and complexity, do not even moderately correlate with a file's change-proneness or error-proneness. In contrast, co-change-related measures performed better. Our results suggest that, to identify files with true technical debt-those that experience excessive changes or bugs-co-change information must be considered. Code-based measures are largely ineffective at pinpointing true debt. Finally, this study reveals the need for the community to create benchmarks and data sets to assess the accuracy of software analysis tools in terms of commonly used measures.

[1]  Eduardo Figueiredo,et al.  On the evaluation of code smells and detection tools , 2017, Journal of Software Engineering Research and Development.

[2]  Julien Delange,et al.  Technical Debt in Practice: How to Find It and Fix It , 2021 .

[3]  Srini Ramaswamy,et al.  Experiences Applying Automated Architecture Analysis Tool Suites , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Yuanfang Cai,et al.  Decoupling Level: A New Metric for Architectural Maintenance Complexity , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[5]  Terese Besker,et al.  An Overview and Comparison of Technical Debt Measurement Tools , 2021, IEEE Software.

[6]  Alan MacCormack,et al.  Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code , 2006, Manag. Sci..

[7]  Tushar Sharma,et al.  Designite - A Software Design Quality Assessment Tool , 2016, 2016 IEEE/ACM 1st International Workshop on Bringing Architectural Design Thinking Into Developers' Daily Activities (BRIDGE).

[8]  Robert C. Martin Agile Software Development, Principles, Patterns, and Practices , 2002 .

[9]  Francesca Arcelli Fontana,et al.  Automatic detection of bad smells in code: An experimental assessment , 2012, J. Object Technol..

[10]  Yuanfang Cai,et al.  Architecture Anti-Patterns: Automatically Detectable Violations of Design Principles , 2021, IEEE Transactions on Software Engineering.

[11]  Ward Cunningham,et al.  The WyCash portfolio management system , 1992, OOPSLA '92.

[12]  Alexander Chatzigeorgiou,et al.  JDeodorant: Identification and Removal of Type-Checking Bad Smells , 2008, 2008 12th European Conference on Software Maintenance and Reengineering.

[13]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[14]  Eduardo Figueiredo,et al.  A review-based comparative study of bad smell detection tools , 2016, EASE.

[15]  Yuanfang Cai,et al.  Design rule spaces: a new form of architecture insight , 2014, ICSE.

[16]  Mika Mäntylä,et al.  20-MAD - 20 Years of Issues and Commits of Mozilla and Apache Development , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).