Development of recommendation systems for software engineering: the CROSSMINER experience

To perform their daily tasks, developers intensively make use of existing resources by consulting open source software (OSS) repositories. Such platforms contain rich data sources, e.g., code snippets, documentations, and user discussions, that can be useful for supporting development activities. Over the last decades, several techniques and tools have been promoted to provide developers with innovative features, aiming to bring in improvements in terms of development effort, cost savings, and productivity. In the context of the EU H2020 CROSSMINER project, a set of recommendation systems has been conceived to assist software programmers in different phases of the development process. The systems provide developers with various artifacts, such as thirdparty libraries, documentation about how to use the APIs being adopted, or relevant API function calls. To develop such recommendations, various technical choices have been made to overcome issues related to several aspects including the lack of baselines, limited data availability, decisions about the performance measures, and evaluation approaches. This paper is an experience report to present the knowledge pertinent to the set of recommendation systems developed through the CROSSMINER project. We explain in detail the challenges we had to deal with, together with the related lessons learned when developing and evaluating these systems. Our aim is to provide the research community with concrete takeaway messages that are expected to be useful for those who want to develop or customize their own recommendation systems. The reported experiences can facilitate interesting discussions and research work, which in the end contribute to the advancement of recommendation systems applied to solve different issues in Software Engineering.

[1]  Katsuro Inoue,et al.  Search-based software library recommendation using multi-objective optimization , 2017, Inf. Softw. Technol..

[2]  Tasting as a projective technique , 2008 .

[3]  Juri Di Rocco,et al.  CrossSim: Exploiting Mutual Relationships to Detect Similar OSS Projects , 2018, 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[4]  Massimiliano Di Penta,et al.  FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[5]  Mira Mezini,et al.  On evaluating recommender systems for API usages , 2008, RSSE '08.

[6]  Martin P. Robillard,et al.  Recommendation Systems for Software Engineering , 2010, IEEE Software.

[7]  S. Ghose,et al.  Taste tests: Impacts of consumer perceptions and preferences on brand positioning strategies , 2001 .

[8]  Arne Sølvberg,et al.  Domain Engineering , 2013, Springer Berlin Heidelberg.

[9]  Mouzhi Ge,et al.  Beyond accuracy: evaluating recommender systems by coverage and serendipity , 2010, RecSys '10.

[10]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[11]  Juri Di Rocco,et al.  Mining Software Repositories to Support OSS Developers: A Recommender Systems Approach , 2018, IIR.

[12]  Ghizlane El-Boussaidi,et al.  Context extraction in recommendation systems in software engineering: a preliminary survey , 2015, CASCON.

[13]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .

[14]  Gail C. Murphy,et al.  How to Build a Recommendation System for Software Engineering , 2013, LASER Summer School.

[15]  Mira Mezini,et al.  Ieee Transactions on Software Engineering 1 Automated Api Property Inference Techniques , 2022 .

[16]  Paul Klint,et al.  M3: A general model for code analytics in rascal , 2015, 2015 IEEE 1st International Workshop on Software Analytics (SWAN).

[17]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[18]  Alejandro Bellogín,et al.  A comparative study of heterogeneous item recommendations in social systems , 2013, Inf. Sci..

[19]  David Lo,et al.  Detecting similar repositories on GitHub , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[20]  Markus Zanker,et al.  Linked open data to support content-based recommender systems , 2012, I-SEMANTICS '12.

[21]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[22]  Rachel K. E. Bellamy,et al.  Moving into a new software project landscape , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[23]  Christian Posse,et al.  The Browsemaps: Collaborative Filtering at LinkedIn , 2014, RSWeb@RecSys.

[24]  C. Gomez-Uribe,et al.  The Netflix Recommender System: Algorithms, Business Value, and Innovation , 2016, ACM Trans. Manag. Inf. Syst..

[25]  Juri Di Rocco,et al.  TopFilter: An Approach to Recommend Relevant GitHub Topics , 2020, ESEM.

[26]  Hamed Zamani,et al.  Current challenges and visions in music recommender systems research , 2017, International Journal of Multimedia Information Retrieval.

[27]  Daniel M. Germán,et al.  [Engineering Paper] SCC: Automatic Classification of Code Snippets , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[28]  Davide Di Ruscio,et al.  Democratizing the development of recommender systems by means of low-code platforms , 2020, MoDELS.

[29]  David Lo,et al.  Automated library recommendation , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[30]  Collin McMillan,et al.  Detecting similar software applications , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[31]  Andrea Janes,et al.  What recommendation systems for software engineering recommend: A systematic literature review , 2016, J. Syst. Softw..

[32]  F. O. Isinkaye,et al.  Recommendation systems: Principles, methods and evaluation , 2015 .

[33]  Roberto Di Cosmo,et al.  Software Heritage: Why and How to Preserve Software Source Code , 2017, iPRES.

[34]  Riccardo Rubei,et al.  An automated approach to assess the similarity of GitHub repositories , 2020, Software Quality Journal.

[35]  Katsuro Inoue,et al.  MUDABlue: an automatic categorization system for open source repositories , 2004, 11th Asia-Pacific Software Engineering Conference.

[36]  Marco Tulio Valente,et al.  What's in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform , 2018, J. Syst. Softw..

[37]  Waralak V. Siricharoen,et al.  Recommendation systems for software engineering: A survey from software development life cycle phase perspective , 2014, The 9th International Conference for Internet Technology and Secured Transactions (ICITST-2014).

[38]  Tommaso Di Noia,et al.  Recommender Systems and Linked Open Data , 2015, Reasoning Web.

[39]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[40]  Juri Di Rocco,et al.  PostFinder: Mining Stack Overflow posts to support software developers , 2020, Inf. Softw. Technol..

[41]  Dongmei Zhang,et al.  CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[42]  Walid Maalej,et al.  Potentials and challenges of recommendation systems for software development , 2008, RSSE '08.

[43]  Gabriele Bavota,et al.  Prompter - Turning the IDE into a self-confident programming assistant , 2016, Empir. Softw. Eng..

[44]  Pablo Castells,et al.  Novelty and diversity metrics for recommender systems: Choice, discovery and relevance , 2011 .

[45]  Phuong Nguyen,et al.  An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data , 2015, WWW.

[46]  Xingyu Pan,et al.  CodEX: Source Code Plagiarism Detection Based on Abstract Syntax Tree , 2018, AICS.

[47]  Charles A. Sutton,et al.  Parameter-free probabilistic API mining across GitHub , 2015, SIGSOFT FSE.

[48]  Ying Zou,et al.  API usage pattern recommendation for software development , 2017, J. Syst. Softw..

[49]  Saul Vargas,et al.  Improving sales diversity by recommending users to items , 2014, RecSys '14.

[50]  Davide Di Ruscio,et al.  A Multinomial Naïve Bayesian (MNB) Network to Automatically Recommend Topics for GitHub Repositories , 2020, EASE.

[51]  Tzu-Tsung Wong,et al.  Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation , 2015, Pattern Recognit..

[52]  Gabriele Bavota,et al.  How Can I Use This Method? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[53]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[54]  Saul Vargas,et al.  Rank and relevance in novelty and diversity metrics for recommender systems , 2011, RecSys '11.

[55]  David Lo,et al.  Why and how developers fork what from whom in GitHub , 2017, Empirical Software Engineering.

[56]  Massimiliano Di Penta,et al.  Recommending API Function Calls and Code Snippets to Support Software Development , 2021, IEEE Transactions on Software Engineering.

[57]  Kai Chen,et al.  Mining succinct and high-coverage API usage patterns from source code , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[58]  Paolo Tomeo,et al.  Content-Based Recommendations via DBpedia and Freebase: A Case Study in the Music Domain , 2015, International Semantic Web Conference.

[59]  Jordi Cabot,et al.  A Systematic Mapping Study of Software Development With GitHub , 2017, IEEE Access.

[60]  Jian Pei,et al.  MAPO: Mining and Recommending API Usage Patterns , 2009, ECOOP.

[61]  Massimiliano Di Penta,et al.  CrossRec: Supporting software developers by recommending third-party libraries , 2020, J. Syst. Softw..

[62]  Katsuro Inoue,et al.  Improving reusability of software libraries through usage pattern mining , 2018, J. Syst. Softw..

[63]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[64]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[65]  D. Spinellis,et al.  How is open source affecting software development? , 2004, IEEE Software.

[66]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[67]  Collin McMillan,et al.  Recommending source code examples via API call usages and documentation , 2010, RSSE '10.

[68]  Even-André Karlsson,et al.  Software reuse: a holistic approach , 1995 .