TopFilter: An Approach to Recommend Relevant GitHub Topics

Background: In the context of software development, GitHub has been at the forefront of platforms to store, analyze and maintain a large number of software repositories. Topics have been introduced by GitHub as an effective method to annotate stored repositories. However, labeling GitHub repositories should be carefully conducted to avoid adverse effects on project popularity and reachability. Aims: We present TopFilter, a novel approach to assist open source software developers in selecting suitable topics for GitHub repositories being created. Method: We built a project-topic matrix and applied a syntactic-based similarity function to recommend missing topics by representing repositories and related topics in a graph. The ten-fold cross-validation methodology has been used to assess the performance of TopFilter by considering different metrics, i.e., success rate, precision, recall, and catalog coverage. Result: The results show that TopFilter recommends good topics depending on different factors, i.e., collaborative filtering settings, considered datasets, and pre-processing activities. Moreover, TopFilter can be combined with a state-of-the-art topic recommender system (i.e., MNB network) to improve the overall prediction performance. Conclusion: Our results confirm that collaborative filtering techniques can successfully be used to provide relevant topics for GitHub repositories. Moreover, TopFilter can gain a significant boost in prediction performances by employing the outcomes obtained by the MNB network as its initial set of topics.

[1]  Yan Zhang,et al.  User personalized label set extraction algorithm based on LDA and collaborative filtering in open source software community , 2018, 2018 International Conference on Computer, Information and Telecommunication Systems (CITS).

[2]  Wenyuan Xu,et al.  REPERSP: Recommending Personalized Software Projects on GitHub , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[3]  Marco Tulio Valente,et al.  Understanding the Factors That Impact the Popularity of GitHub Repositories , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[4]  Massimiliano Di Penta,et al.  CrossRec: Supporting software developers by recommending third-party libraries , 2020, J. Syst. Softw..

[5]  Yan Liu,et al.  Collaborative Topic Regression with Social Matrix Factorization for Recommendation Systems , 2012, ICML.

[6]  Jordi Cabot,et al.  Findings from GitHub: Methods, Datasets and Limitations , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[7]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[8]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[9]  Davide Di Ruscio,et al.  A Multinomial Naïve Bayesian (MNB) Network to Automatically Recommend Topics for GitHub Repositories , 2020, EASE.

[10]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[11]  Dan Frankowski,et al.  Collaborative Filtering Recommender Systems , 2007, The Adaptive Web.

[12]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[13]  M. Matteucci,et al.  An Evaluation Methodology for Collaborative Recommender Systems , 2008, 2008 International Conference on Automated Solutions for Cross Media Content and Multi-Channel Distribution.

[14]  Maryam Nooraei Abadeh,et al.  Recommending human resources to project leaders using a collaborative filtering-based recommender system: Case study of gitHub , 2019, IET Softw..

[15]  Barry W. Boehm,et al.  Towards Better Understanding of Software Quality Evolution through Commit-Impact Analysis , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[16]  Martin P. Robillard,et al.  Recommendation Systems for Software Engineering , 2010, IEEE Software.

[17]  Catarina Miranda,et al.  Incremental Collaborative Filtering for Binary Ratings , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[18]  S. Floyd,et al.  Adaptive Web , 1997 .

[19]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[20]  Zhi-Dan Zhao,et al.  User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[21]  Guillermo Ricardo Simari,et al.  Argument-based mixed recommenders and their application to movie suggestion , 2014, Expert Syst. Appl..