Development of recommendation systems for software engineering: the CROSSMINER experience

To perform their daily tasks, developers intensively make use of existing resources by consulting open source software (OSS) repositories. Such platforms contain rich data sources, e.g., code snippets, documentations, and user discussions, that can be useful for supporting development activities. Over the last decades, several techniques and tools have been promoted to provide developers with innovative features, aiming to bring in improvements in terms of development effort, cost savings, and productivity. In the context of the EU H2020 CROSSMINER project, a set of recommendation systems has been conceived to assist software programmers in different phases of the development process. The systems provide developers with various artifacts, such as third-party libraries, documentation about how to use the APIs being adopted, or relevant API function calls. To develop such recommendations, various technical choices have been made to overcome issues related to several aspects including the lack of baselines, limited data availability, decisions about the performance measures, and evaluation approaches. This paper is an experience report to present the knowledge pertinent to the set of recommendation systems developed through the CROSSMINER project. We explain in detail the challenges we had to deal with, together with the related lessons learned when developing and evaluating these systems. Our aim is to provide the research community with concrete takeaway messages that are expected to be useful for those who want to develop or customize their own recommendation systems. The reported experiences can facilitate interesting discussions and research work, which in the end contribute to the advancement of recommendation systems applied to solve different issues in Software Engineering.

[1]  F. O. Isinkaye,et al.  Recommendation systems: Principles, methods and evaluation , 2015 .

[2]  Katsuro Inoue,et al.  Search-based software library recommendation using multi-objective optimization , 2017, Inf. Softw. Technol..

[3]  Marco Tulio Valente,et al.  What's in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform , 2018, J. Syst. Softw..

[4]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[5]  Mira Mezini,et al.  On evaluating recommender systems for API usages , 2008, RSSE '08.

[6]  Juri Di Rocco,et al.  PostFinder: Mining Stack Overflow posts to support software developers , 2020, Inf. Softw. Technol..

[7]  Hamed Zamani,et al.  Current challenges and visions in music recommender systems research , 2017, International Journal of Multimedia Information Retrieval.

[8]  David Lo,et al.  Why and how developers fork what from whom in GitHub , 2017, Empirical Software Engineering.

[9]  Saul Vargas,et al.  Rank and relevance in novelty and diversity metrics for recommender systems , 2011, RecSys '11.

[10]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[11]  GasparicMarko,et al.  What recommendation systems for software engineering recommend , 2016 .

[12]  S. Ghose,et al.  Taste tests: Impacts of consumer perceptions and preferences on brand positioning strategies , 2001 .

[13]  Collin McMillan,et al.  Recommending source code examples via API call usages and documentation , 2010, RSSE '10.

[14]  Gail C. Murphy,et al.  How to Build a Recommendation System for Software Engineering , 2013, LASER Summer School.

[15]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[16]  Alejandro Bellogín,et al.  A comparative study of heterogeneous item recommendations in social systems , 2013, Inf. Sci..

[17]  Charles A. Sutton,et al.  Parameter-free probabilistic API mining across GitHub , 2015, SIGSOFT FSE.

[18]  Markus Zanker,et al.  Linked open data to support content-based recommender systems , 2012, I-SEMANTICS '12.

[19]  Martin P. Robillard,et al.  Recommendation Systems in Software Engineering , 2014, Springer Berlin Heidelberg.

[20]  Ying Zou,et al.  API usage pattern recommendation for software development , 2017, J. Syst. Softw..

[21]  Saul Vargas,et al.  Improving sales diversity by recommending users to items , 2014, RecSys '14.

[22]  Davide Di Ruscio,et al.  A Multinomial Naïve Bayesian (MNB) Network to Automatically Recommend Topics for GitHub Repositories , 2020, EASE.

[23]  Tzu-Tsung Wong,et al.  Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation , 2015, Pattern Recognit..

[24]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[25]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[26]  Massimiliano Di Penta,et al.  CrossRec: Supporting software developers by recommending third-party libraries , 2020, J. Syst. Softw..

[27]  Katsuro Inoue,et al.  Improving reusability of software libraries through usage pattern mining , 2018, J. Syst. Softw..

[28]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[29]  Phuong Nguyen,et al.  An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data , 2015, WWW.

[30]  Robert J. Walker,et al.  Strathcona example recommendation tool , 2005, ESEC/FSE-13.

[31]  Paolo Tomeo,et al.  Content-Based Recommendations via DBpedia and Freebase: A Case Study in the Music Domain , 2015, International Semantic Web Conference.

[32]  Tasting as a projective technique , 2008 .

[33]  Arne Sølvberg,et al.  Domain Engineering , 2013, Springer Berlin Heidelberg.

[34]  Mouzhi Ge,et al.  Beyond accuracy: evaluating recommender systems by coverage and serendipity , 2010, RecSys '10.

[35]  Rachel K. E. Bellamy,et al.  Moving into a new software project landscape , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[36]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[37]  Riccardo Rubei,et al.  An automated approach to assess the similarity of GitHub repositories , 2020, Software Quality Journal.

[38]  Walid Maalej,et al.  Potentials and challenges of recommendation systems for software development , 2008, RSSE '08.

[39]  Davide Di Ruscio,et al.  Democratizing the development of recommender systems by means of low-code platforms , 2020, MoDELS.

[40]  TopFilter , 2020, Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[41]  Tommaso Di Noia,et al.  Recommender Systems and Linked Open Data , 2015, Reasoning Web.