Automatic link extraction: The good, the bad and the ugly in software ecosystem mining

This abstract presents the automatic link extraction pitfalls based on our experience on manually investigating links in the RubyGems package manager metadata. This work can lead in automating the link extraction approach so as to avoid these pitfalls and produce more complete datasets to be used by researchers when they investigate the multi-platform evolution of software ecosystems.

[1]  Georgios Gousios,et al.  Structure and Evolution of Package Dependency Networks , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[2]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[3]  T. Mens,et al.  Socio-technical evolution of the Ruby ecosystem in GitHub , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[4]  Tom Mens,et al.  When GitHub Meets CRAN: An Analysis of Inter-Repository Package Dependency Problems , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[5]  Eleni Constantinou,et al.  An empirical comparison of developer retention in the RubyGems and npm software ecosystems , 2017, Innovations in Systems and Software Engineering.