What's in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform

Besides a git-based version control system, GitHub integrates several social coding features. Particularly, GitHub users can star a repository, presumably to manifest interest or satisfaction with an open source project. However, the real and practical meaning of starring a project was never the subject of an in-depth and well-founded empirical investigation. Therefore, we provide in this paper a throughout study on the meaning, characteristics, and dynamic growth of GitHub stars. First, by surveying 791 developers, we report that three out of four developers consider the number of stars before using or contributing to a GitHub project. Then, we report a quantitative analysis on the characteristics of the top-5,000 most starred GitHub repositories. We propose four patterns to describe stars growth, which are derived after clustering the time series representing the number of stars of the studied repositories; we also reveal the perception of 115 developers about these growth patterns. To conclude, we provide a list of recommendations to open source project managers (e.g., on the importance of social media promotion) and to GitHub users and Software Engineering researchers (e.g., on the risks faced when selecting projects by GitHub stars).

[1]  Rohan Padhye,et al.  A study of external community contribution to open-source projects on GitHub , 2014, MSR 2014.

[2]  Christoph Treude,et al.  How Modern News Aggregators Help Development Communities Shape and Share Knowledge , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[3]  Marco Tulio Valente,et al.  Predicting the Popularity of GitHub Repositories , 2016, PROMISE.

[4]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[5]  Diego Castro,et al.  Analysis of Test Log Information through Interactive Visualizations , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[6]  Alexander Serebrenik,et al.  STRESS: A Semi-Automated, Fully Replicable Approach for Project Selection , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[7]  Meiyappan Nagappan,et al.  Diversity in software engineering research , 2016, Perspectives on Data Science for Software Engineering.

[8]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[9]  Marco Tulio Valente,et al.  When should internal interfaces be promoted to public? , 2016, SIGSOFT FSE.

[10]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[11]  Flavio Figueiredo,et al.  On the prediction of popularity of trends and hits for user generated videos , 2013, WSDM.

[12]  Christos Faloutsos,et al.  Why people hate your app: making sense of user feedback in a mobile app store , 2013, KDD.

[13]  Ciro Cattuto,et al.  Dynamical classes of collective attention in twitter , 2011, WWW.

[14]  David Lo,et al.  Understanding inactive yet available assignees in GitHub , 2017, Inf. Softw. Technol..

[15]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[16]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[17]  Hudson Silva Borges,et al.  How Do Developers Promote Open Source Projects? , 2019, Computer.

[18]  Yuming Zhou,et al.  What Are the Dominant Projects in the GitHub Python Ecosystem? , 2016, 2016 Third International Conference on Trustworthy Systems and their Applications (TSA).

[19]  Ahmed E. Hassan,et al.  Impact of Installation Counts on Perceived Quality: A Case Study on Debian , 2011, 2011 18th Working Conference on Reverse Engineering.

[20]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[21]  Ahmed E. Hassan,et al.  Fresh apps: an empirical study of frequently-updated mobile apps in the Google play store , 2015, Empirical Software Engineering.

[22]  Daniela Cruzes,et al.  Recommended Steps for Thematic Synthesis in Software Engineering , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[23]  Ahmed E. Hassan,et al.  Studying the needed effort for identifying duplicate issues , 2015, Empirical Software Engineering.

[24]  Ahmed E. Hassan,et al.  Impact of Ad Libraries on Ratings of Android Mobile Apps , 2014, IEEE Software.

[25]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[26]  David Lo,et al.  Popularity, Interoperability, and Impact of Programming Languages in 100,000 Open Source Projects , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[27]  Virgílio A. F. Almeida,et al.  Capacity Planning for Web Services: Metrics, Models, and Methods , 2001 .

[28]  Ali Mesbah,et al.  Same App, Different App Stores: A Comparative Study , 2017, 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[29]  Olga Baysal,et al.  Investigating the android apps' success: An empirical study , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[30]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[31]  Danny Dig,et al.  Understanding the use of lambda expressions in Java , 2017, Proc. ACM Program. Lang..

[32]  David H. Wolpert,et al.  An Efficient Method To Estimate Bagging's Generalization Error , 1999, Machine Learning.

[33]  Marco Tulio Valente,et al.  Why we refactor? confessions of GitHub contributors , 2016, SIGSOFT FSE.

[34]  Papamichail Michail,et al.  User-Perceived Source Code Quality Estimation Based on Static Analysis Metrics , 2016 .

[35]  David Lo,et al.  What are the characteristics of high-rated apps? A case study on free Android Applications , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[36]  Meiyappan Nagappan,et al.  Curating GitHub for engineered software projects , 2016, PeerJ Prepr..

[37]  Marco Tulio Valente,et al.  A novel approach for estimating Truck Factors , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[38]  Premkumar T. Devanbu,et al.  Wait for It: Determinants of Pull Request Evaluation Latency on GitHub , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[39]  Jordi Cabot,et al.  A Systematic Mapping Study of Software Development With GitHub , 2017, IEEE Access.

[40]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[41]  Marco Tulio Valente,et al.  Why modern open source projects fail , 2017, ESEC/SIGSOFT FSE.

[42]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Ilenia Fronza,et al.  Better Code for Better Apps: A Study on Source Code Quality and Market Success of Android Applications , 2015, 2015 2nd ACM International Conference on Mobile Software Engineering and Systems.

[44]  Hanspeter Mössenböck,et al.  An Analysis of x86-64 Inline Assembly in C Programs , 2018, VEE.

[45]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[46]  Dietmar Pfahl,et al.  Using Dynamic and Contextual Features to Predict Issue Lifetime in GitHub Projects , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[47]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[48]  Gerrit Müller,et al.  Popularity , 2013, The Journal of Human Resources.

[49]  Gunwoong Lee,et al.  Determinants of Mobile Apps' Success: Evidence from the App Store Market , 2014, J. Manag. Inf. Syst..

[50]  Jan Bosch,et al.  Social Networking Meets Software Development: Perspectives from GitHub, MSDN, Stack Exchange, and TopCoder , 2013, IEEE Software.

[51]  Marco Tulio Valente,et al.  Understanding the Factors That Impact the Popularity of GitHub Repositories , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[52]  Eleni Stroulia,et al.  Co-evolution of project documentation and popularity within github , 2014, MSR 2014.

[53]  Gabriele Bavota,et al.  API change and fault proneness: a threat to the success of Android apps , 2013, ESEC/FSE 2013.

[54]  Audris Mockus,et al.  Patterns of folder use and project popularity: a case study of github repositories , 2014, ESEM '14.

[55]  William N. Robinson,et al.  Evolutionary Software Requirements Factors and their Effect on Open Source Project Attractiveness , 2017, HICSS.

[56]  Michalis Faloutsos,et al.  A First Step Towards Understanding Popularity in YouTube , 2010, 2010 INFOCOM IEEE Conference on Computer Communications Workshops.

[57]  Flavio Figueiredo,et al.  On the Dynamics of Social Media Popularity: A YouTube Case Study , 2014, TOIT.

[58]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[59]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[60]  Robert Heumüller,et al.  Programmers do not favor lambda expressions for concurrent object-oriented code , 2018, Empirical Software Engineering.

[61]  Virgílio A. F. Almeida Capacity Planning for Web Services , 2002, Performance.

[62]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[63]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[64]  D. Hinkle,et al.  Applied statistics for the behavioral sciences , 1979 .