License usage and changes: a large-scale study on gitHub

Open source software licenses determine, from a legal point of view, under which conditions software can be integrated and redistributed. The reason why developers of a project adopt (or change) a license may depend on various factors, e.g., the need for ensuring compatibility with certain third-party components, the perspective towards redistribution or commercialization of the software, or the need for protecting against somebody else’s commercial usage of the software. This paper reports a large empirical study aimed at quantitatively and qualitatively investigating when and why developers adopt or change software licenses. Specifically, we first identify license changes in 1,731,828 commits, representing the entire history of 16,221 Java projects hosted on GitHub. Then, to understand the rationale of license changes, we perform a qualitative analysis on 1,160 projects written in seven different programming languages, namely C, C++, C#, Java, Javascript, Python, and Ruby—following an open coding approach inspired by grounded theory—on commit messages and issue tracker discussions concerning licensing topics, and whenever possible, try to build traceability links between discussions and changes. On one hand, our results highlight how, in different contexts, license adoption or changes can be triggered by various reasons. On the other hand, the results also highlight a lack of traceability of when and why licensing changes are made. This can be a major concern, because a change in the license of a system can negatively impact those that reuse it. In conclusion, results of the study trigger the need for better tool support in guiding developers in choosing/changing licenses and in keeping track of the rationale of license changes.

[1]  Collin McMillan,et al.  Recommending source code for use in rapid software prototypes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[2]  Mario Linares Vásquez,et al.  ChangeScribe: A Tool for Automatically Generating Commit Messages , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[3]  Gabriele Bavota,et al.  License Usage and Changes: A Large-Scale Study of Java Projects on GitHub , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[4]  Collin McMillan,et al.  Detecting similar software applications , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[5]  Amanda Brock Project Harmony: Inbound transfer of rights in FOSS Projects , 2010 .

[6]  Collin McMillan,et al.  A search engine for finding highly relevant applications , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[7]  Mario Linares Vásquez,et al.  On Automatically Generating Commit Messages via Summarization of Source Code Changes , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[8]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[9]  Robert Gobeille,et al.  The FOSSology project , 2008, MSR '08.

[10]  Michele Lanza,et al.  Leveraging Crowd Knowledge for Software Comprehension and Development , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[11]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[12]  Tommi Kärkkäinen,et al.  Automated software license analysis , 2009, Automated Software Engineering.

[13]  A. Strauss,et al.  Grounded theory , 2017 .

[14]  Gail C. Murphy,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[15]  W. Fuller,et al.  Distribution of the Estimators for Autoregressive Time Series with a Unit Root , 1979 .

[16]  Christian Bird,et al.  Diversity in software engineering research , 2013, ESEC/FSE 2013.

[17]  Katsuro Inoue,et al.  A Method to Detect License Inconsistencies in Large-Scale Open Source Projects , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[18]  Sebastian Spaeth,et al.  Code Reuse in Open Source Software , 2008, Manag. Sci..

[19]  Hridesh Rajan,et al.  Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20]  Joachim Henkel,et al.  Code Reuse in Open Source Software Development: Quantitative Evidence, Drivers, and Impediments , 2010, J. Assoc. Inf. Syst..

[21]  Daniel M. Germán,et al.  Understanding and Auditing the Licensing of Open Source Software Distributions , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[22]  Param Vir Singh,et al.  Networks, Social Influence, and the Choice Among Competing Innovations: Insights from Open Source Software Licenses , 2013, Inf. Syst. Res..

[23]  Gabriele Bavota,et al.  The market for open source: An intelligent virtual open source marketplace , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[24]  Collin McMillan,et al.  Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications , 2012, IEEE Transactions on Software Engineering.

[25]  Daniel M. Germán,et al.  License integration patterns: Addressing license mismatches in component-based development , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[26]  Collin McMillan,et al.  Portfolio: Searching for relevant functions and their usages in millions of lines of code , 2013, TSEM.

[27]  Daniel M. Germán,et al.  Identifying licensing of jar archives using a code-search approach , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[28]  Gabriele Bavota,et al.  Automatic generation of release notes , 2014, SIGSOFT FSE.

[29]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[30]  Daniel M. Germán,et al.  An exploratory study of the evolution of software licensing , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[31]  Gabriele Bavota,et al.  The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache , 2013, 2013 IEEE International Conference on Software Maintenance.

[32]  W. Fuller,et al.  LIKELIHOOD RATIO STATISTICS FOR AUTOREGRESSIVE TIME SERIES WITH A UNIT ROOT , 1981 .

[33]  Katsuro Inoue,et al.  A sentence-matching method for automatic license identification of source code files , 2010, ASE.

[34]  Daniel M. Germán,et al.  Who are Source Code Contributors and How do they Change? , 2009, 2009 16th Working Conference on Reverse Engineering.

[35]  Gabriele Bavota,et al.  Mining StackOverflow to turn the IDE into a self-confident programming prompter , 2014, MSR 2014.

[36]  Gabriele Bavota,et al.  When and why developers adopt and change software licenses , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[37]  Katsuro Inoue,et al.  Evolutional analysis of licenses in FOSS , 2010, IWPSE-EVOL '10.

[38]  Daniel M. Germán,et al.  Code siblings: Technical and legal implications of copying code between applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[39]  R. Holmes,et al.  Using structural context to recommend source code examples , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..