Predicting build failures using social network analysis on developer communication

A critical factor in work group coordination, communication has been studied extensively. Yet, we are missing objective evidence of the relationship between successful coordination outcome and communication structures. Using data from IBM's Jazz™ project, we study communication structures of development teams with high coordination needs. We conceptualize coordination outcome by the result of their code integration build processes (successful or failed) and study team communication structures with social network measures. Our results indicate that developer communication plays an important role in the quality of software integrations. Although we found that no individual measure could indicate whether a build will fail or succeed, we leveraged the combination of communication structure measures into a predictive model that indicates whether an integration will fail. When used for five project teams, our predictive model yielded recall values between 55% and 75%, and precision values between 50% to 76%.

[1]  Andrew B. Hargadon,et al.  Technology brokering and innovation in a product development firm. , 1997 .

[2]  Pär J. Ågerfalk,et al.  Global Software Development Challenges: A Case Study on Temporal, Geographical and Socio-Cultural Distance , 2006, 2006 IEEE International Conference on Global Software Engineering (ICGSE'06).

[3]  James D. Herbsleb,et al.  Splitting the organization and integrating the code: Conway's law revisited , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[4]  Andreas Zeller,et al.  Predicting component failures at design time , 2006, ISESE '06.

[5]  Audris Mockus,et al.  An Empirical Study of Speed and Communication in Globally Distributed Software Development , 2003, IEEE Trans. Software Eng..

[6]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[7]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[8]  James D. Herbsleb,et al.  Collaboration In Software Engineering Projects: A Theory Of Coordination , 2006, ICIS.

[9]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[10]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[11]  J. Pinto,et al.  Project team communication and cross-functional cooperation in new program development , 1990 .

[12]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[13]  C.N. Gonzalez-Brambila,et al.  Social Capital in Academic Engineers , 2007, PICMET '07 - 2007 Portland International Conference on Management of Engineering & Technology.

[14]  Ezra W. Zuckerman,et al.  Networks, Diversity, and Productivity: The Social Capital of Corporate R&D Teams , 2001 .

[15]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[16]  Robert E. Kraut,et al.  Coordination in software development , 1995, CACM.

[17]  Kon Shing Kenneth Chung,et al.  Actor centrality correlates to project based coordination , 2006, CSCW '06.

[18]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[19]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[20]  Ahmed E. Hassan,et al.  Using Decision Trees to Predict the Certification Result of a Build , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[21]  Li-Te Cheng,et al.  How a good software practice thwarts collaboration: the multiple roles of APIs in software development , 2004, SIGSOFT '04/FSE-12.

[22]  Randall Frost,et al.  Jazz and the Eclipse Way of Collaboration , 2007, IEEE Software.

[23]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[24]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[25]  Diane Liang Rulke,et al.  Distribution of Knowledge, Group Network Structure, and Group Performance , 2000 .

[26]  Bill Curtis,et al.  A field study of the software design process for large systems , 1988, CACM.

[27]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[28]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[29]  James D. Herbsleb,et al.  On Coordination Mechanisms in Global Software Development , 2007, International Conference on Global Software Engineering (ICGSE 2007).

[30]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[31]  Daniela E. Damian,et al.  Awareness in the Wild: Why Communication Breakdowns Occur , 2007, International Conference on Global Software Engineering (ICGSE 2007).

[32]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[33]  André van der Hoek,et al.  Towards Awareness in the Large , 2006, 2006 IEEE International Conference on Global Software Engineering (ICGSE'06).

[34]  G Breithardt,et al.  Distribution of knowledge. , 1997, European heart journal.

[35]  W. Souder Managing relations between R&D and marketing in new product development projects☆ , 1988 .

[36]  A. Griffin,et al.  Patterns of Communication Among Marketing, Engineering and Manufacturing-A Comparison Between Two New Product Teams , 1992 .

[37]  Yan Zhao,et al.  Visualization of Communication Patterns in Collaborative Innovation Networks - Analysis of Some W3C Working Groups , 2003, CIKM '03.

[38]  Li-Te Cheng,et al.  Sometimes you need to see through walls: a field study of application programming interfaces , 2004, CSCW.

[39]  Jonathan J. Cadiz,et al.  Coordination, overload and team performance: effects of team communication strategies , 1998, CSCW '98.

[40]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[41]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[42]  James D. Herbsleb,et al.  The geography of coordination: dealing with distance in R&D work , 1999, GROUP.

[43]  Pamela J. Hinds,et al.  Structures that work: social structure, work structure and coordination ease in geographically distributed teams , 2006, CSCW '06.

[44]  Sandro Morasca,et al.  Deriving models of software fault-proneness , 2002, SEKE '02.