Technical Debt in the Peer-Review Documentation of R Packages: a rOpenSci Case Study

Context: Technical Debt (TD) is a metaphor used to describe code that is "not quite right." Although TD studies have gained momentum, TD has yet to be studied as thoroughly in non-Object-Oriented (OO) or scientific software such as R. R is a multi-paradigm programming language, whose popularity in data science and statistical applications has amplified in recent years. Due to R’s inherent ability to expand through user-contributed packages, several community-led organizations were created to organize and peer-review packages in a concerted effort to increase their quality. Nonetheless, it is well-known that most R users do not have a technical programming background, being from multiple disciplines. Objective: The goal of this study is to investigate TD in the documentation of the peer-review of R packages led by rOpenSci. Method: We collected over 5,000 comments from 157 packages that had been reviewed and approved to be published at rOpenSci. We manually analyzed a sample dataset of these comments posted by package authors, editors of rOpenSci, and reviewers during the review process to investigate the types of TD present in these reviews. Results: The findings of our study include (i) a taxonomy of TD derived from our analysis of the peer-reviews (ii) documentation debt as being the most prevalent type of debt (iii) different user roles are concerned with different types of TD. For instance, reviewers tend to report some types of TD more than other roles, and the types of TD they report are different from those reported by the authors of a package. Conclusion: TD analysis in scientific software or peer-review is almost non-existent. Our study is a pioneer but within the context of R packages. However, our findings can serve as a starting point for replication studies, given our public datasets, to perform similar analyses in other scientific software or to investigate the rationale behind our findings.

[1]  Vipin Balachandran,et al.  Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[2]  Kazi Zakia Sultana,et al.  The Relationship between Traceable Code Patterns and Code Smells , 2017, SEKE.

[3]  Daniel R. Greening Release Duration and Enterprise Agility , 2013, 2013 46th Hawaii International Conference on System Sciences.

[4]  Forrest Shull,et al.  Investigating the impact of design debt on software quality , 2011, MTD '11.

[5]  Claude Ghaoui,et al.  Encyclopedia of Human Computer Interaction , 2005 .

[6]  Forrest Shull,et al.  A case study on effectively identifying technical debt , 2013, EASE '13.

[7]  Patrick Debois,et al.  Agile Infrastructure and Operations: How Infra-gile are You? , 2008, Agile 2008 Conference.

[8]  Tsong Yueh Chen,et al.  Metamorphic Testing: A Simple Yet Effective Approach for Testing Scientific Software , 2019, Computing in Science & Engineering.

[9]  Philippe Kruchten,et al.  What is social debt in software engineering? , 2013, 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[10]  Tom Mens,et al.  When GitHub Meets CRAN: An Analysis of Inter-Repository Package Dependency Problems , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[11]  Foutse Khomh,et al.  An Exploratory Study of the Impact of Code Smells on Software Change-proneness , 2009, 2009 16th Working Conference on Reverse Engineering.

[12]  Mary Popeck,et al.  Got Technical Debt? Surfacing Elusive Technical Debt in Issue Trackers , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[13]  Zadia Codabux,et al.  An empirical assessment of technical debt practices in industry , 2017, J. Softw. Evol. Process..

[14]  Peng Liang,et al.  A systematic mapping study on technical debt and its management , 2015, J. Syst. Softw..

[15]  Rami Bahsoon,et al.  CloudMTD: Using real options to manage technical debt in cloud-based service selection , 2013, 2013 4th International Workshop on Managing Technical Debt (MTD).

[16]  Carolyn B. Seaman,et al.  Measuring and Monitoring Technical Debt , 2011, Adv. Comput..

[17]  Jan Vitek,et al.  Towards a Type System for R , 2019 .

[18]  Daniel Blankenberg,et al.  Software engineering for scientific big data analysis , 2019, GigaScience.

[19]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[20]  Robert L. Nord,et al.  Technical Debt: From Metaphor to Theory and Practice , 2012, IEEE Software.

[21]  Yasutaka Kamei,et al.  A survey of self-admitted technical debt , 2019, J. Syst. Softw..

[22]  Neoklis Polyzotis,et al.  Data Validation for Machine Learning , 2019, SysML.

[23]  Frank Buschmann,et al.  To Pay or Not to Pay Technical Debt , 2011, IEEE Software.

[24]  Scott Chamberlain,et al.  Building Software, Building Community: Lessons from the rOpenSci Project , 2014 .

[25]  Carolyn B. Seaman,et al.  A portfolio approach to technical debt management , 2011, MTD '11.

[26]  Moffat Mathews,et al.  How Junior Developers Deal with Their Technical Debt? , 2020, 2020 IEEE/ACM International Conference on Technical Debt (TechDebt).

[27]  Marcos Kalinowski,et al.  Usability Technical Debt in Software Projects: A Multi-Case Study , 2019, 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[28]  Satoshi Masuda,et al.  Guidelines for Quality Assurance of Machine Learning-based Artificial Intelligence , 2020, SEKE.

[29]  Kazi Zakia Sultana,et al.  The Relationship Between Code Smells and Traceable Patterns - Are They Measuring the Same Thing? , 2017, Int. J. Softw. Eng. Knowl. Eng..

[30]  Tom Mens,et al.  On the maintainability of CRAN packages , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[31]  Gabriele Bavota,et al.  A Large-Scale Empirical Study on Self-Admitted Technical Debt , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[32]  Yuanfang Cai,et al.  Organizing the technical debt landscape , 2012, 2012 Third International Workshop on Managing Technical Debt (MTD).

[33]  David Lo,et al.  Is Using Deep Learning Frameworks Free? Characterizing Technical Debt in Deep Learning Frameworks , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS).

[34]  Rami Bahsoon,et al.  Database Design Debts through Examining Schema Evolution , 2016, 2016 IEEE 8th International Workshop on Managing Technical Debt (MTD).

[35]  Rafael Prikladnicki,et al.  6 th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE) , 2013, CHASE 2013.

[36]  James D. Herbsleb,et al.  Scientific software production: incentives and collaboration , 2011, CSCW.

[37]  Daniel M. Germán,et al.  The Evolution of the R Software Ecosystem , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[38]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[39]  Angel Rubio,et al.  Code Profiling in R: A Review of Existing Methods and an Introduction to Package GUIProfiler , 2015, R J..

[40]  Robert L. Nord,et al.  Managing technical debt in software-reliant systems , 2010, FoSER '10.

[41]  Pieter M. Kroonenberg,et al.  The Tale of Cochran's Rule: My Contingency Table has so Many Expected Values Smaller than 5, What Am I to Do? , 2018 .

[42]  Shane McIntosh,et al.  An empirical study of design discussions in code review , 2018, ESEM.

[43]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[44]  Carolyn B. Seaman,et al.  Defining the decision factors for managing defects: A technical debt perspective , 2012, 2012 Third International Workshop on Managing Technical Debt (MTD).

[45]  Hernán Astudillo,et al.  Hearing the Voice of Software Practitioners on Causes, Effects, and Practices to Deal with Documentation Debt , 2020, REFSQ.

[46]  Zadia Codabux,et al.  Managing technical debt: An industrial case study , 2013, 2013 4th International Workshop on Managing Technical Debt (MTD).

[47]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[48]  Neil A. Ernst,et al.  Measure it? Manage it? Ignore it? software practitioners and technical debt , 2015, ESEC/SIGSOFT FSE.

[49]  D. Sculley,et al.  The ML test score: A rubric for ML production readiness and technical debt reduction , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[50]  Gail C. Murphy,et al.  How does Machine Learning Change Software Development Practices? , 2021, IEEE Transactions on Software Engineering.

[51]  Kevin Clair Technical Debt as an Indicator of Library Metadata Quality , 2016, D Lib Mag..

[52]  Uwe Zdun,et al.  Evolution of the R software ecosystem: Metrics, relationships, and their impact on qualities , 2017, J. Syst. Softw..

[53]  Rodrigo O. Spínola,et al.  Towards an Ontology of Terms on Technical Debt , 2014, 2014 Sixth International Workshop on Managing Technical Debt.

[54]  Yi Sun,et al.  Some Code Smells Have a Significant but Small Effect on Faults , 2014, TSEM.

[55]  Kazi Zakia Sultana,et al.  Examining the Relationship of Code and Architectural Smells with Software Vulnerabilities , 2020, 2020 27th Asia-Pacific Software Engineering Conference (APSEC).

[56]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[57]  Marco Tulio Valente,et al.  Beyond the Code: Mining Self-Admitted Technical Debt in Issue Tracker Systems , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[58]  Jan Vitek,et al.  R melts brains: an IR for first-class environments and lazy effectful arguments , 2019, DLS.

[59]  Jan Vitek,et al.  Evaluating the Design of the R Language - Objects and Functions for Data Analysis , 2012, ECOOP.

[60]  J. David Morgenthaler,et al.  Searching for build debt: Experiences managing technical debt at Google , 2012, 2012 Third International Workshop on Managing Technical Debt (MTD).

[61]  Kurt Hornik,et al.  Prospects and challenges in R package development , 2011, Comput. Stat..

[62]  David Lo,et al.  Identifying self-admitted technical debt in open source projects using text mining , 2017, Empirical Software Engineering.

[63]  Jürgen Döllner,et al.  Monitoring code quality and development activity by software maps , 2011, MTD '11.