On the Detection of Community Smells Using Genetic Programming-based Ensemble Classifier Chain

Community smells are symptoms of organizational and social issues within the software development community that often increase the project costs and impact software quality. Recent studies have identified a variety of community smells and defined them as suboptimal patterns connected to organizational-social structures in the software development community such as the lack of communication, coordination and collaboration. Recognizing the advantages of the early detection of potential community smells in a software project, we introduce a novel approach that learns from various community organizational and social practices to provide an automated support for detecting community smells. In particular, our approach learns from a set of interleaving organizational-social symptoms that characterize the existence of community smell instances in a software project. We build a multi-label learning model to detect 8 common types of community smells. We use the ensemble classifier chain (ECC) model that transforms multi-label problems into several single-label problems which are solved using genetic programming (GP) to find the optimal detection rules for each smell type. To evaluate the performance of our approach, we conducted an empirical study on a benchmark of 103 open source projects and 407 community smell instances. The statistical tests of our results show that our approach can detect the eight considered smell types with an average F-measure of 89% achieving a better performance compared to different state-of-the-art techniques. Furthermore, we found that the most influential factors that best characterize community smells include the social network density and closeness centrality as well as the standard deviation of the number of developers per time zone and per community.

[1]  Paolo Tell,et al.  Researching Cooperation and Communication in Continuous Software Engineering , 2018, 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[2]  Günther Ruhe,et al.  Search Based Software Engineering , 2013, Lecture Notes in Computer Science.

[3]  James D. Herbsleb,et al.  Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity , 2008, ESEM '08.

[4]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[5]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[6]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[7]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[8]  Natalie Shlomo,et al.  Participant recruitment in sensitive surveys: a comparative trial of ‘opt in’ versus ‘opt out’ approaches , 2013, BMC Medical Research Methodology.

[9]  William Sugar Studies of ID Practices , 2014 .

[10]  Rick Kazman,et al.  Exploring Community Smells in Open-Source: An Automated Approach , 2019, IEEE Transactions on Software Engineering.

[11]  Philippe Kruchten,et al.  Social debt in software engineering: insights from industry , 2015, Journal of Internet Services and Applications.

[12]  John A. Clark,et al.  Metrics are fitness functions too , 2004, 10th International Symposium on Software Metrics, 2004. Proceedings..

[13]  Houari A. Sahraoui,et al.  Search-based refactoring: Towards semantics preservation , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[14]  Katsuro Inoue,et al.  Search-Based Peer Reviewers Recommendation in Modern Code Review , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[15]  N. Cliff Dominance statistics: Ordinal analyses to answer ordinal questions. , 1993 .

[16]  Katsuro Inoue,et al.  Multi-Criteria Code Refactoring Using Search-Based Software Engineering , 2016, ACM Trans. Softw. Eng. Methodol..

[17]  Alexander Serebrenik,et al.  Discovering community patterns in open-source: a systematic approach and its evaluation , 2018, Empirical Software Engineering.

[18]  Motoshi Saeki Communication, Collaboration, and Cooperation in Software Development-How Should We Support Group Work in Software Development? , 1995, APSEC.

[19]  Houari A. Sahraoui,et al.  Search-Based Refactoring Using Recorded Code Changes , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[20]  Philippe Kruchten,et al.  What is social debt in software engineering? , 2013, 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[21]  Marco Tulio Valente,et al.  A novel approach for estimating Truck Factors , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[22]  Marcelo Cataldo,et al.  The impact of geographic distribution and the nature of technical coupling on the quality of global software development projects , 2012, J. Softw. Maintenance Res. Pract..

[23]  Harald C. Gall,et al.  Does distributed development affect software quality? An empirical case study of Windows Vista , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[24]  Houari A. Sahraoui,et al.  The use of development history in software refactoring using a multi-objective evolutionary algorithm , 2013, GECCO '13.

[25]  Katsuro Inoue,et al.  Search-based software library recommendation using multi-objective optimization , 2017, Inf. Softw. Technol..

[26]  Sven Apel,et al.  From Developer Networks to Verified Communities: A Fine-Grained Approach , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[27]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[28]  F. Glover,et al.  Handbook of Metaheuristics , 2019, International Series in Operations Research & Management Science.

[29]  Yang Feng,et al.  Towards more accurate multi-label software behavior learning , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[30]  Laurie A. Williams,et al.  Socio-technical developer networks: should we trust our measurements? , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[31]  Alex Alves Freitas,et al.  A Tutorial on Multi-label Classification Techniques , 2009, Foundations of Computational Intelligence.

[32]  Marouane Kessentini,et al.  Search-Based Web Service Antipatterns Detection , 2017, IEEE Transactions on Services Computing.

[33]  Katsuro Inoue,et al.  Web Service Antipatterns Detection Using Genetic Programming , 2015, GECCO.

[34]  Ahmed E. Hassan,et al.  Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews , 2015, Empirical Software Engineering.

[35]  Marouane Kessentini,et al.  Detecting Android Smells Using Multi-Objective Genetic Programming , 2017, 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[36]  Bertrand Meyer,et al.  How Do Distribution and Time Zones Affect Software Development? A Case Study on Communication , 2011, 2011 IEEE Sixth International Conference on Global Software Engineering.

[37]  Peter J. Angeline,et al.  Genetic programming and emergent intelligence , 1994 .

[38]  William Sugar Studies of ID Practices: A Review and Synthesis of Research on ID Current Practices , 2014 .

[39]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[40]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[41]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[42]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[43]  Mayuram S. Krishnan,et al.  The role of software processes and communication in offshore software development , 2002, CACM.

[44]  Alexander Serebrenik,et al.  Poster: How Do Community Smells Influence Code Smells? , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[45]  Yang Feng,et al.  Multi-label software behavior learning , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[46]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[47]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[48]  Yuanyuan Zhang,et al.  Search-based software engineering: Trends, techniques and applications , 2012, CSUR.

[49]  Jordi Cabot,et al.  Assessing the bus factor of Git repositories , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[50]  Audris Mockus,et al.  Software Dependencies, Work Dependencies, and Their Impact on Failures , 2009, IEEE Transactions on Software Engineering.

[51]  Audris Mockus,et al.  An Empirical Study of Speed and Communication in Globally Distributed Software Development , 2003, IEEE Trans. Software Eng..

[52]  Harald C. Gall,et al.  Putting It All Together: Using Socio-technical Networks to Predict Failures , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[53]  Alexander Serebrenik,et al.  Beyond Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells? , 2018, IEEE Transactions on Software Engineering.

[54]  Marco Tulio Valente,et al.  A Comparative Study of Algorithms for Estimating Truck Factor , 2016, 2016 X Brazilian Symposium on Software Components, Architectures and Reuse (SBCARS).

[55]  Rick Kazman,et al.  The Architect's Role in Community Shepherding , 2016, IEEE Software.

[56]  Marcelo Cataldo,et al.  On the relationship between process maturity and geographic distribution: an empirical analysis of their impact on software quality , 2009, ESEC/FSE '09.

[57]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[58]  Houari A. Sahraoui,et al.  Maintainability defects detection and correction: a multi-objective approach , 2013, Automated Software Engineering.