Predicting software revision outcomes on GitHub using structural holes theory

Many software repositories are hosted publicly online via social platforms. Online users contribute to the software projects not only by providing feedback and suggestions, but also by submitting revisions to improve the software quality. This study takes a close look at revisions and examines the impact of social media networks on the revision outcome. A novel approach with a mix of different research methods (e.g., ego-centric social network analysis, structural holes theory and survival analysis) is used to build a comprehensible model to predict the revision outcome. The predictive performance is validated using real life datasets obtained from GitHub, the social coding website, which contains 32,962 pull requests to submit revisions, 20,399 distinctive software project repositories, and a social network of 234,322 users. Good predictive performance has been achieved with an average AUC of 0.84. The results suggest that a repository host's position in the ego network plays an important role in determining the duration before a revision is accepted. Specifically, hosts that are positioned in between densely connected social groups are likely to respond more quickly to accept the revisions. The study demonstrates that online social networks are vital to software development and advances the understanding of collaboration in software development research. The proposed method can be applied to support decision making in software development to forecast revision duration. The result also has several implications for managing project collaboration using social media.

[1]  Gerald R. Ferris,et al.  Social Networks within Sales Organizations: Their Development and Importance for Salesperson Performance , 2015 .

[2]  Audris Mockus,et al.  An Empirical Study of Speed and Communication in Globally Distributed Software Development , 2003, IEEE Trans. Software Eng..

[3]  David Lo,et al.  Network Structure of Social Coding in GitHub , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[4]  Mathias Klier,et al.  The connectedness, pervasiveness and ubiquity of online social networks , 2014, Comput. Networks.

[5]  Christopher Lettl,et al.  A Social Network Perspective of Lead Users and Creativity: An Empirical Study among Children , 2008 .

[6]  Stuart G. Baker,et al.  A Score Test for Non‐Informative Censoring Using Doubly Sampled Grouped Survival Data , 1993 .

[7]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[8]  Arun Sundararajan,et al.  Research Commentary - Information in Digital, Economic, and Social Networks , 2013, Inf. Syst. Res..

[9]  Sandro Morasca,et al.  A Survey on Open Source Software Trustworthiness , 2011, IEEE Software.

[10]  Emmanuel S. Gritti,et al.  Quantifying components of risk for European woody species under climate change , 2006 .

[11]  Giancarlo Succi,et al.  An empirical study of open-source and closed-source software products , 2004, IEEE Transactions on Software Engineering.

[12]  Brian Fitzgerald,et al.  Why and How Should Open Source Projects Adopt Time-Based Releases? , 2015, IEEE Software.

[13]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[14]  Peter A. Gloor,et al.  Coolfarming – Lessons from the Beehive to Increase Organizational Creativity , 2014 .

[15]  Kash Barker,et al.  Quantifying the risk of project delays with a genetic algorithm , 2015 .

[16]  James J. Chen,et al.  Assessment of performance of survival prediction models for cancer prognosis , 2012, BMC Medical Research Methodology.

[17]  R. Tao,et al.  Proposed diagnostic criteria for internet addiction. , 2010, Addiction.

[18]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[19]  Sungjune Park,et al.  Sustaining Web 2.0 services: A survival analysis of a live crowd-casting service , 2013, Decis. Support Syst..

[20]  Jure Leskovec,et al.  Discovering social circles in ego networks , 2012, ACM Trans. Knowl. Discov. Data.

[21]  P. Grambsch,et al.  Proportional hazards tests and diagnostics based on weighted residuals , 1994 .

[22]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[23]  Paul Janssen,et al.  Frailty Model , 2007, International Encyclopedia of Statistical Science.

[24]  Alessandro Lomi,et al.  A model for the multiplex dynamics of two-mode and one-mode networks, with an application to employment preference, friendship, and advice , 2013, Soc. Networks.

[25]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[26]  C. Menza,et al.  Predictive mapping of fish species richness across shallow-water seascapes in the Caribbean , 2007 .

[27]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[28]  Gang Peng,et al.  Network Structures and Online Technology Adoption , 2011, IEEE Transactions on Engineering Management.

[29]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[30]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[31]  Paul D. Allison,et al.  Survival analysis using sas®: a practical guide , 1995 .

[32]  Charu C. Aggarwal,et al.  Social Network Data Analytics , 2011 .

[33]  Gerardo Canfora,et al.  Tracking Your Changes: A Language-Independent Approach , 2009, IEEE Software.

[34]  Alexander Richter,et al.  The deep structure of organizational online networking – an actor‐oriented case study , 2015, Inf. Syst. J..

[35]  David Alan Grier The GitHub Effect , 2015, Computer.

[36]  N. L. Bhanu Murthy,et al.  Mining GitHub for Novel Change Metrics to Predict Buggy Files in Software Systems , 2015, 2015 International Conference on Computational Intelligence and Networks.

[37]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[38]  Christopher Lettl,et al.  The Social Network Position of Lead Users , 2016 .

[39]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[40]  Kevin McGrath,et al.  Using Free and Open Source Tools to Manage Software Quality , 2015, ACM Queue.

[41]  Akbar Zaheer,et al.  Network Evolution: The Origins of Structural Holes , 2009 .

[42]  Tim Dixon,et al.  Building Large-Scale Information Services: Tools and Experiences from the DESIRE Project , 1998, Comput. Networks.

[43]  Brian Fitzgerald,et al.  Inner Source--Adopting Open Source Development Practices in Organizations: A Tutorial , 2015, IEEE Software.

[44]  Murat Gunduz,et al.  Fuzzy Assessment Model to Estimate the Probability of Delay in Turkish Construction Projects , 2015 .

[45]  Alex Pentland,et al.  Composite Social Network for Predicting Mobile Apps Installation , 2011, AAAI.

[46]  Stefano Chessa,et al.  On service discovery in mobile social networks: Survey and perspectives , 2015, Comput. Networks.

[47]  R. Burt Social Contagion and Innovation: Cohesion versus Structural Equivalence , 1987, American Journal of Sociology.

[48]  D. Cox Regression Models and Life-Tables , 1972 .

[49]  Gerald C. Kane,et al.  What's Different about Social Media Networks? A Framework and Research Agenda , 2014, MIS Q..

[50]  N. Jigeesh,et al.  Analysis and control of issues that delay pharmaceutical projects , 2015 .

[51]  Jan Bosch,et al.  Social Networking Meets Software Development: Perspectives from GitHub, MSDN, Stack Exchange, and TopCoder , 2013, IEEE Software.

[52]  Stephen P. Borgatti,et al.  Centrality and network flow , 2005, Soc. Networks.

[53]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[54]  D. Kleinbaum,et al.  Survival Analysis: A Self-Learning Text. , 1996 .

[55]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[56]  P. Grambsch,et al.  Modeling Survival Data: Extending the Cox Model , 2000 .

[57]  Reidar Conradi,et al.  A Layered Architecture for Uniform Version Management , 2001, IEEE Trans. Software Eng..

[58]  Viswanath Venkatesh,et al.  Model of Acceptance with Peer Support: A Social Network Perspective to Understand Employees' System Use , 2009, MIS Q..

[59]  A. Greve,et al.  DIFFUSION OF TECHNOLOGY: COHESION OR STRUCTURAL EQUIVALENCE? , 1995 .

[60]  Dirk Riehle How open source is changing the software developer's career , 2015, Computer.

[61]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[62]  Inger Persson,et al.  Essays on the Assumption of Proportional Hazards in Cox Regression , 2002 .

[63]  Premkumar T. Devanbu,et al.  Wait for It: Determinants of Pull Request Evaluation Latency on GitHub , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[64]  Michiel van Genuchten,et al.  On the Impact of Being Open , 2015, IEEE Software.

[65]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[66]  R. Burt Structural Holes and Good Ideas1 , 2004, American Journal of Sociology.

[67]  Freda Kemp Modern Applied Statistics with S , 2003 .

[68]  Alexander Richter,et al.  Mixed methods analysis of enterprise social networks , 2014, Comput. Networks.

[69]  Thorsten Strufe,et al.  A survey on decentralized Online Social Networks , 2014, Comput. Networks.

[70]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[71]  Gordon B. Davis,et al.  User Acceptance of Information Technology: Toward a Unified View , 2003, MIS Q..

[72]  Robert J. Kauffman,et al.  Event history, spatial analysis and count data methods for empirical research in information systems , 2011, Information Technology and Management.

[73]  Eric M. Reyes,et al.  Tutorial Survival Estimation for Cox Regression Models with Time-Varying coefficients , 2014 .

[74]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[75]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[76]  Wenzhong Li,et al.  Data routing strategies in opportunistic mobile social networks: Taxonomy and open challenges , 2015, Comput. Networks.

[77]  Martin G. Everett,et al.  Analyzing social networks , 2013 .

[78]  Leonardo Gresta Paulino Murta,et al.  Acceptance factors of pull requests in open-source projects , 2015, SAC.

[79]  Chengcheng Hu,et al.  A test for informative censoring in clustered survival data. , 2004, Statistics in medicine.

[80]  Hamdi A. Bashir,et al.  Causes of Delay in Construction Projects in the Oil and Gas Industry in the Gulf Cooperation Council Countries: A Case Study , 2015 .

[81]  Yanmin Zhu,et al.  When data contributors meet multiple crowdsourcers: Bilateral competition in mobile crowdsourcing , 2016, Comput. Networks.

[82]  Detlef Schoder,et al.  Cross-cultural gender differences in the adoption and usage of social media platforms - An exploratory study of Last.FM , 2014, Comput. Networks.

[83]  Sanjeev Kumar,et al.  Joint Effect of Team Structure and Software Architecture in Open Source Software Development , 2013, IEEE Transactions on Engineering Management.

[84]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[85]  W. Dolfsma,et al.  Social networks for innovation and new product development , 2016 .

[86]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[87]  Ronald S. Burt,et al.  Positions in Networks , 1976 .

[88]  Miltos Petridis,et al.  Exposing the Influencing Factors on Software Project Delay with Actor-Network Theory , 2014 .