An Exploratory Research of GitHub Based on Graph Model

GitHub has accumulated a great number of developers and open source projects. In this research, we utilize property graph model to explore complex relationships and entities of GitHub. We attempt to answer three questions associated with GitHub using the dataset from MSR2014 data challenge. Firstly, we propose a graph based method to find out the cross technology background developers on GitHub. Secondly we define interesting metrics based on discrete entropy to analyze the project imbalance induced by commit action within a software family. The results show that the imbalance of development size induced by root projects is greater than that of development speed. Finally, we sort out the relatively important root projects with two link analysis methods and the experiment result demonstrates that our method is effective.

[1]  Eirini Kalliamvakou,et al.  Understanding "watchers" on GitHub , 2014, MSR 2014.

[2]  In-Young Ko,et al.  Population dynamics in open source communities: an ecological approach applied to github , 2014, WWW.

[3]  Gang Yin,et al.  Reviewer Recommender of Pull-Requests in GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[4]  Richard F. Paige,et al.  Analysing the 'biodiversity' of open source ecosystems: the GitHub case , 2014, MSR 2014.

[5]  Silviu Guiasu,et al.  The principle of maximum entropy , 1985 .

[6]  John M. Carroll,et al.  Exploring the ecosystem of software developers on GitHub and other platforms , 2014, CSCW Companion '14.

[7]  Michele Lanza,et al.  Proceedings of CSMR 2007 (11th European Conference on Software Maintenance and Reengineering) , 2007 .

[8]  Ken-ichi Matsumoto,et al.  A Study of the Characteristics of Developers' Activities in GitHub , 2013, 2013 20th Asia-Pacific Software Engineering Conference (APSEC).

[9]  Bing Xie,et al.  Recommending relevant projects via user behaviour: an exploratory study on github , 2014, CrowdSoft 2014.

[10]  Franco Scarselli,et al.  Inside PageRank , 2005, TOIT.

[11]  Alexander Serebrenik,et al.  Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[12]  Benoit Baudry,et al.  On Analyzing the Topology of Commit Histories in Decentralized Version Control Systems , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[13]  Yang Li,et al.  Sentiment analysis of commit comments in GitHub: an empirical study , 2014, MSR 2014.

[14]  Chanchal Kumar Roy,et al.  An insight into the pull requests of GitHub , 2014, MSR 2014.

[15]  David Lo,et al.  Network Structure of Social Coding in GitHub , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[16]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[17]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[18]  Shen Beijun,et al.  Mining GitHub: Why Commit Stops -- Exploring the Relationship between Developer's Commit Pattern and File Version Evolution , 2013, 2013 20th Asia-Pacific Software Engineering Conference (APSEC).

[19]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[20]  Weiqiang Zhang,et al.  Developer social networks in software engineering: construction, analysis, and applications , 2014, Science China Information Sciences.

[21]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .