Research on Software Project Developer Behaviors with K-means Clustering Analysis

Research on technical debt and community smell have drawn increasing attention in the academia of software engineering in the latest decade. Furthermore, data mining methods have been widely applied in the very domain as well. However, limited studies have contribute to the understanding of software project community using data mining methods, especially regarding the analysis of developer behaviors. Using K-means clustering, this study provides a preliminary analysis on the classification of open source software project developers based on the statistics of their behaviors related to technical debts. The results show that developers can be categorized into three different behavior groups, including, Veterans, Vulnerability Creators, and Fault Inducers/ Commoners.

[1]  Rick Kazman,et al.  The Architect's Role in Community Shepherding , 2016, IEEE Software.

[2]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[3]  Audris Mockus,et al.  Quantifying the Effect of Code Smells on Maintenance Effort , 2013, IEEE Transactions on Software Engineering.

[4]  Alberto Sillitti,et al.  A Survey on Code Analysis Tools for Software Maintenance Prediction , 2018, SEDA.

[5]  Philippe Kruchten,et al.  Social debt in software engineering: insights from industry , 2015, Journal of Internet Services and Applications.

[6]  Chao Liu,et al.  Data Mining for Software Engineering , 2009, Computer.

[7]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[8]  Radu Marinescu,et al.  Assessing technical debt by identifying design flaws in software systems , 2012, IBM J. Res. Dev..

[9]  Gabriele Bavota,et al.  Detecting bad smells in source code using change history information , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Yuanfang Cai,et al.  Comparing four approaches for technical debt identification , 2014, Software Quality Journal.

[11]  Davide Taibi,et al.  The Technical Debt Dataset , 2019, PROMISE.

[12]  Gemma Catolino,et al.  Gender Diversity and Women in Software Teams: How Do They Affect Community Smells? , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS).

[13]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[14]  Kouichi Kishida,et al.  Evolution patterns of open-source software systems and communities , 2002, IWPSE '02.

[15]  Alexander Serebrenik,et al.  Discovering community patterns in open-source: a systematic approach and its evaluation , 2018, Empirical Software Engineering.

[16]  Nikolaos Tsantalis,et al.  Using Natural Language Processing to Automatically Detect Self-Admitted Technical Debt , 2017, IEEE Transactions on Software Engineering.

[17]  Zadia Codabux,et al.  Technical Debt Prioritization Using Predictive Analytics , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[18]  Robert L. Nord,et al.  Managing technical debt in software-reliant systems , 2010, FoSER '10.

[19]  Davide Spadini,et al.  PyDriller: Python framework for mining software repositories , 2018, ESEC/SIGSOFT FSE.

[20]  Alexander Serebrenik,et al.  How do community smells influence code smells? , 2018, ICSE.

[21]  Daniela Cruzes,et al.  The evolution and impact of code smells: A case study of two open source systems , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[22]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[23]  Jean-Louis Letouzey,et al.  Managing Technical Debt with the SQALE Method , 2012, IEEE Software.

[24]  Ward Cunningham,et al.  The WyCash portfolio management system , 1992, OOPSLA '92.

[25]  Srini Ramaswamy,et al.  Mining CVS Repositories to Understand Open-Source Project Developer Roles , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[26]  Danny Dig,et al.  Accurate and Efficient Refactoring Detection in Commit History , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[27]  David Lo,et al.  Identifying self-admitted technical debt in open source projects using text mining , 2017, Empirical Software Engineering.

[28]  John Langford,et al.  Making Contextual Decisions with Low Technical Debt , 2016 .

[29]  Alexander Serebrenik,et al.  How Remote Work Can Foster a More Inclusive Environment for Transgender Developers , 2019, 2019 IEEE/ACM 2nd International Workshop on Gender Equality in Software Engineering (GE).

[30]  Carolyn B. Seaman,et al.  Measuring and Monitoring Technical Debt , 2011, Adv. Comput..

[31]  Peng Liang,et al.  A systematic mapping study on technical debt and its management , 2015, J. Syst. Softw..

[32]  Beijun Shen,et al.  Code Bad Smell Detection through Evolutionary Data Mining , 2015, 2015 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[33]  Mathieu Goeminne,et al.  Analyzing ecosystems for open source software developer communities: Analyzing and Managing Business Networks in the Software Industry , 2013 .

[34]  Naouel Moha,et al.  Sniffing Android Code Smells: An Association Rules Mining-Based Approach , 2019, 2019 IEEE/ACM 6th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[35]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[36]  Bikram Sengupta,et al.  Evolution of developer collaboration on the jazz platform: a study of a large scale agile project , 2011, ISEC.

[37]  M. C. Ortiz,et al.  Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes , 2004 .