Better Technical Debt Detection via SURVEYing

Software analytics can be improved by surveying; i.e. rechecking and (possibly) revising the labels offered by prior analysis. Surveying is a time-consuming task and effective surveyors must carefully manage their time. Specifically, they must balance the cost of further surveying against the additional benefits of that extra effort. This paper proposes SURVEY0, an incremental Logistic Regression estimation method that implements cost/benefit analysis. Some classifier is used to rank the as-yet-unvisited examples according to how interesting they might be. Humans then review the most interesting examples, after which their feedback is used to update an estimator for estimating how many examples are remaining. This paper evaluates SURVEY0 in the context of self-admitted technical debt. As software project mature, they can accumulate "technical debt" i.e. developer decisions which are sub-optimal and decrease the overall quality of the code. Such decisions are often commented on by programmers in the code; i.e. it is self-admitted technical debt (SATD). Recent results show that text classifiers can automatically detect such debt. We find that we can significantly outperform prior results by SURVEYing the data. Specifically, for ten open-source JAVA projects, we can find 83% of the technical debt via SURVEY0 using just 16% of the comments (and if higher levels of recall are required, SURVEY0can adjust towards that with some additional effort).

[1]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[2]  Jürgen Graf Speeding Up Context-, Object- and Field-Sensitive SDG Generation , 2010, SCAM 2010.

[3]  Radu Marinescu,et al.  InCode: Continuous Quality Assessment and Improvement , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Mário André de Freitas Farias,et al.  A Contextualized Vocabulary Model for identifying technical debt on code comments , 2015, 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD).

[6]  Ward Cunningham,et al.  The WyCash portfolio management system , 1992, OOPSLA '92.

[7]  Francesca Arcelli Fontana,et al.  Investigating the impact of code smells debt on quality code evaluation , 2012, 2012 Third International Workshop on Managing Technical Debt (MTD).

[8]  Tim Menzies,et al.  FAST$^2$: Better Automated Support for Finding Relevant SE Research Papers , 2017 .

[9]  Emad Shihab,et al.  Detecting and quantifying different types of self-admitted technical Debt , 2015, 2015 IEEE 7th International Workshop on Managing Technical Debt (MTD).

[10]  Per Runeson,et al.  A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies , 2017, EASE.

[11]  Joost Visser,et al.  An empirical model of technical debt and interest , 2011, MTD '11.

[12]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[13]  Maura R. Grossman,et al.  Engineering Quality and Reliability in Technology-Assisted Review , 2016, SIGIR.

[14]  André L. M. Santos,et al.  Tracking technical debt — An exploratory case study , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[15]  Carla E. Brodley,et al.  Active Literature Discovery for Scoping Evidence Reviews How Many Needles are There , 2013 .

[16]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[17]  Ondrej Lhoták,et al.  Application-Only Call Graph Construction , 2012, ECOOP.

[18]  Carolyn B. Seaman,et al.  A Balancing Act: What Software Practitioners Have to Say about Technical Debt , 2012, IEEE Softw..

[19]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[20]  David Lo,et al.  SATD Detector: A Text-Mining-Based Self-Admitted Technical Debt Detection Tool , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[21]  Emad Shihab,et al.  Examining the Impact of Self-Admitted Technical Debt on Software Quality , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[22]  Jan Bosch,et al.  The Danger of Architectural Technical Debt: Contagious Debt and Vicious Circles , 2015, 2015 12th Working IEEE/IFIP Conference on Software Architecture.

[23]  Rohit D. Mane Assessing the Refactorability of Software Clones , 2017 .

[24]  Radu Marinescu,et al.  Assessing technical debt by identifying design flaws in software systems , 2012, IBM J. Res. Dev..

[25]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[26]  Vili Podgorelec,et al.  Enhanced Feature Selection Using Word Embeddings for Self-Admitted Technical Debt Identification , 2018, 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[27]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[28]  Tim Menzies,et al.  Finding better active learners for faster literature reviews , 2016, Empirical Software Engineering.

[29]  Eleni Stroulia,et al.  JDeodorant: identification and application of extract class refactorings , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[30]  Maura R. Grossman,et al.  Evaluation of machine-learning protocols for technology-assisted review in electronic discovery , 2014, SIGIR.

[31]  Forrest Shull,et al.  A case study on effectively identifying technical debt , 2013, EASE '13.

[32]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[33]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[34]  Nikolaos Tsantalis,et al.  Using Natural Language Processing to Automatically Detect Self-Admitted Technical Debt , 2017, IEEE Transactions on Software Engineering.

[35]  René Witte,et al.  Automatic Quality Assessment of Source Code Comments: The JavadocMiner , 2010, NLDB.

[36]  Alexander Chatzigeorgiou,et al.  Identification of extract method refactoring opportunities for the decomposition of methods , 2011, J. Syst. Softw..

[37]  David Lo,et al.  Identifying self-admitted technical debt in open source projects using text mining , 2017, Empirical Software Engineering.

[38]  Junjie Wang,et al.  Images don't lie: Duplicate crowdtesting reports detection with screenshot information , 2019, Inf. Softw. Technol..

[39]  Ahmed E. Hassan,et al.  Understanding the rationale for updating a function’s comment , 2008, 2008 IEEE International Conference on Software Maintenance.

[40]  Radu Marinescu,et al.  Detection strategies: metrics-based rules for detecting design flaws , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[41]  Les Hatton,et al.  Testing the Value of Checklists in Code Inspections , 2008, IEEE Software.

[42]  Elmar Jürgens,et al.  Quality analysis of source code comments , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[43]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[44]  Tim Menzies,et al.  Characterizing Crowds to Better Optimize Worker Recommendation in Crowdsourced Testing , 2021, IEEE Transactions on Software Engineering.

[45]  Alexander Serebrenik,et al.  An Empirical Study on the Removal of Self-Admitted Technical Debt , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[46]  David Lo,et al.  Automating Change-Level Self-Admitted Technical Debt Determination , 2019, IEEE Transactions on Software Engineering.

[47]  Rodrigo O. Spínola,et al.  Towards an Ontology of Terms on Technical Debt , 2014, 2014 Sixth International Workshop on Managing Technical Debt.

[48]  Robert L. Nord,et al.  Technical Debt: From Metaphor to Theory and Practice , 2012, IEEE Software.

[49]  Di Chen,et al.  Replication Can Improve Prior Results: A GitHub Study of Pull Request Acceptance , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[50]  Emad Shihab,et al.  An Exploratory Study on Self-Admitted Technical Debt , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[51]  Tim Menzies,et al.  FAST2: An intelligent assistant for finding relevant papers , 2017, Expert Syst. Appl..

[52]  Gary T. Leavens,et al.  @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.