Analysis of license inconsistency in large collections of open source projects

Free and open source software (FOSS) plays an important role in source code reuse practice. They usually come with one or more software licenses written in the header part of source files, stating the requirements and conditions which should be followed when been reused. Removing or modifying the license statement by re-distributors will result in the inconsistency of license with its ancestor, and may potentially cause license infringement. In this paper, we describe and categorize different types of license inconsistencies and propose a method to detect them. Then we applied this method to Debian 7.5 and a collection of 10,514 Java projects on GitHub and present the license inconsistency cases found in these systems. With a manual analysis, we summarized various reasons behind these license inconsistency cases, some of which imply potential license infringement and require attention from the developers. This analysis also exposes the difficulty to discover license infringements, highlighting the usefulness of finding and maintaining source code provenance.

[1]  Thomas A. Standish An Essay on Software Reuse , 1984, IEEE Transactions on Software Engineering.

[2]  Barry W. Boehm,et al.  Improving Software Productivity , 1987, Computer.

[3]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[4]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[5]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[6]  Robert Gobeille,et al.  The FOSSology project , 2008, MSR '08.

[7]  Marco Torchiano,et al.  Development with Off-the-Shelf Components: 10 Facts , 2009, IEEE Software.

[8]  Tommi Kärkkäinen,et al.  Automated software license analysis , 2009, Automated Software Engineering.

[9]  Daniel M. Germán,et al.  Code siblings: Technical and legal implications of copying code between applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[10]  Walt Scacchi,et al.  Intellectual Property Rights Requirements for Heterogeneously-Licensed Systems , 2009, 2009 17th IEEE International Requirements Engineering Conference.

[11]  Daniel M. Germán,et al.  License integration patterns: Addressing license mismatches in component-based development , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[12]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[13]  Katsuro Inoue,et al.  A sentence-matching method for automatic license identification of source code files , 2010, ASE.

[14]  Daniel M. Germán,et al.  Understanding and Auditing the Licensing of Open Source Software Distributions , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[15]  Katsuro Inoue,et al.  Finding file clones in FreeBSD Ports Collection , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[16]  Katsuro Inoue,et al.  Evolutional analysis of licenses in FOSS , 2010, IWPSE-EVOL '10.

[17]  Lu Zhang,et al.  Automatic checking of license compliance , 2010, 2010 IEEE International Conference on Software Maintenance.

[18]  Junfeng Yang,et al.  Scalable and systematic detection of buggy inconsistencies in source code , 2010, OOPSLA.

[19]  Daniel M. Germán,et al.  An exploratory study of the evolution of software licensing , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[20]  Rainer Koschke,et al.  Frequency and risks of changes to clones , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[21]  Acm Sigsoft IWSC'11 : proceedings of the 5th International Workshop on Software Clones : May 23, 2011 : Waikiki, Honolulu, HI, USA , 2011 .

[22]  Nils Göde,et al.  Oops! . . . I changed it again , 2011, IWSC '11.

[23]  Ying Zou,et al.  An Empirical Study on Inconsistent Changes to Code Clones at Release Level , 2009, 2009 16th Working Conference on Reverse Engineering.

[24]  Shinji Kusumoto,et al.  MPAnalyzer: a tool for finding unintended inconsistencies in program source code , 2014, ASE.

[25]  Katsuro Inoue,et al.  Analyzing the Relationship between the License of Packages and Their Files in Free and Open Source Software , 2014, OSS.

[26]  Gabriele Bavota,et al.  License Usage and Changes: A Large-Scale Study of Java Projects on GitHub , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[27]  Katsuro Inoue,et al.  A Method to Detect License Inconsistencies in Large-Scale Open Source Projects , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[28]  Gabriele Bavota,et al.  When and why developers adopt and change software licenses , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).