Towards quantifying the development value of code contributions

Quantifying the value of developers’ code contributions to a software project requires more than simply counting lines of code or commits. We define the development value of code as a combination of its structural value (the effect of code reuse) and its non-structural value (the impact on development). We propose techniques to automatically calculate both components of development value and combine them using Learning to Rank. Our preliminary empirical study shows that our analysis yields richer results than those obtained by human assessment or simple counting methods and demonstrates the potential of our approach.

[1]  Rainer Koschke,et al.  Effort-Aware Defect Prediction Models , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[2]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[3]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[4]  Emily Hill,et al.  Degree-of-knowledge , 2014, ACM Trans. Softw. Eng. Methodol..

[5]  Barry W. Boehm,et al.  Value-based software engineering: reinventing , 2003, SOEN.

[6]  A. Strauss,et al.  The discovery of grounded theory: strategies for qualitative research aldine de gruyter , 1968 .

[7]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[8]  Sven Apel,et al.  From Developer Networks to Verified Communities: A Fine-Grained Approach , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[9]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[10]  Jonathan I. Maletic,et al.  Exploration, Analysis, and Manipulation of  Source Code Using srcML , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[11]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[12]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[13]  Yasutaka Kamei,et al.  Is lines of code a good measure of effort in effort-aware models? , 2013, Inf. Softw. Technol..

[14]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[15]  David F. Gleich,et al.  PageRank beyond the Web , 2014, SIAM Rev..

[16]  Jeff Bonwick,et al.  The Slab Allocator: An Object-Caching Kernel Memory Allocator , 1994, USENIX Summer.

[17]  Avik Chaudhuri,et al.  Static Typing for Ruby on Rails , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[18]  Barry W. Boehm,et al.  Value-Based Software Engineering: A Case Study , 2003, Computer.

[19]  Yutaka Yamauchi,et al.  Collaboration with Lean Media: how open-source software succeeds , 2000, CSCW '00.

[20]  James D. Herbsleb,et al.  Let's talk about it: evaluating contributions through discussion in GitHub , 2014, SIGSOFT FSE.

[21]  Barry W. Boehm,et al.  Software Development Effort Estimation: Formal Models or Expert Judgment? , 2009, IEEE Software.

[22]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[23]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[24]  Avik Chaudhuri,et al.  Dynamic inference of static types for ruby , 2011, POPL '11.

[25]  Barton P. Miller,et al.  Mining Software Repositories for Accurate Authorship , 2013, 2013 IEEE International Conference on Software Maintenance.

[26]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[27]  Bjarne Stroustrup,et al.  The C++ programming language (2nd ed.) , 1991 .

[28]  James D. Herbsleb,et al.  Impression formation in online peer production: activity traces and personal profiles in github , 2013, CSCW.

[29]  Barry W. Boehm,et al.  Calibrating the COCOMO II Post-Architecture model , 1998, Proceedings of the 20th International Conference on Software Engineering.

[30]  Barry W. Boehm,et al.  Maintenance Effort Estimation for Open Source Software: A Systematic Literature Review , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[31]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis to the Information in CVS Repositories , 2004, MSR.

[32]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[33]  Ayse Tosun Misirli,et al.  Studying high impact fix-inducing changes , 2016, Empirical Software Engineering.

[34]  Robyn Speer,et al.  ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge , 2017, *SEMEVAL.

[35]  E. Burton Swanson,et al.  Characteristics of application software maintenance , 1978, CACM.

[36]  Yuanyuan Zhang,et al.  Economics-Driven Software Architecture , 2014 .

[37]  Walt Scacchi,et al.  Understanding the requirements for developing open source software systems , 2002, IEE Proc. Softw..

[38]  Harald C. Gall,et al.  Putting It All Together: Using Socio-technical Networks to Predict Failures , 2009, 2009 20th International Symposium on Software Reliability Engineering.