Trend Prediction of GitHub using Time Series Analysis

GitHub is a web-based collaborative platform for software developers with around 28 million users and 57 million public repositories which promote social coding. GitHub is an evolving network with the developments in the projects and repositories. The underlying working principle of GitHub is based on the events occurring in a time-based manner. The number of events of a repository is proportional to its popularity. In this proposed work, we perform trend prediction on GitHub. The trend prediction is categorised into three tasks, repository trend prediction, language trend prediction and domain trend prediction respectively. Using most relevant and recurring events, such as Create, Fork, Pullrequest, Push and issue, a multivariate time series is constructed. Trends are predicted with time series forecasting using Long Short Term Memory(LSTM) model. The trend prediction results compare the repositories, languages and domains and help in identifying the trending ones. GitHub insights are taken to facilitate social coding and in selecting the relevant languages and domains for the upcoming projects.

[1]  Marco Tulio Valente,et al.  Predicting the Popularity of GitHub Repositories , 2016, PROMISE.

[2]  Yomi Kastro,et al.  Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks , 2019, Neural Computing and Applications.

[3]  Marco Tulio Valente,et al.  Understanding the Factors That Impact the Popularity of GitHub Repositories , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[4]  Leonardo Gresta Paulino Murta,et al.  What factors influence the reviewer assignment to pull requests? , 2018, Inf. Softw. Technol..

[5]  Michalis E. Zervakis,et al.  A Long Short-Term Memory deep learning network for the prediction of epileptic seizures using EEG signals , 2018, Comput. Biol. Medicine.

[6]  Jianping Chai,et al.  An effective daily box office prediction model based on deep neural networks , 2018, Cognitive Systems Research.

[7]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[8]  Kelly Blincoe,et al.  Understanding the popular users: Following, affiliation influence and leadership on GitHub , 2016, Inf. Softw. Technol..

[9]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[10]  Ha Young Kim,et al.  Forecasting the volatility of stock price index: A hybrid model integrating LSTM with multiple GARCH-type models , 2018, Expert Syst. Appl..

[11]  Kim-Kwang Raymond Choo,et al.  User influence analysis for Github developer social networks , 2018, Expert Syst. Appl..

[12]  Han Cao,et al.  Prediction for Tourism Flow based on LSTM Neural Network , 2017, IIKI.

[13]  Qi Yu,et al.  Online reliability time series prediction via convolutional neural network and long short term memory for service-oriented systems , 2018, Knowledge-Based Systems.

[14]  D. S. Reddy,et al.  Prediction of vegetation dynamics using NDVI time series data and LSTM , 2018, Modeling Earth Systems and Environment.

[15]  David Lo,et al.  A first look at unfollowing behavior on GitHub , 2019, Inf. Softw. Technol..

[16]  Tarik A. Rashid,et al.  Using Accuracy Measure for Improving the Training of LSTM with Metaheuristic Algorithms , 2018 .

[17]  Weiwei Deng,et al.  Ad Click Prediction in Sequence with Long Short-Term Memory Networks: an Externality-aware Model , 2018, SIGIR.

[18]  Zhuo Yang,et al.  Influence analysis of Github repositories , 2016, SpringerPlus.

[19]  Adam Wierzbicki,et al.  Surgical teams on GitHub: Modeling performance of GitHub project development processes , 2018, Inf. Softw. Technol..

[20]  Gang Yin,et al.  Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? , 2016, Inf. Softw. Technol..