Software Ingredients: Detection of Third-Party Component Reuse in Java Software Release

A software product is often dependent on a large number of third-party components.To assess potential risks, such as security vulnerabilities and license violations, a list of components and their versions in a product is important for release engineers and security analysts.Since such a list is not always available, a code comparison technique named Software Bertillonage has been proposed to test whether a product likely includes a copy of a particular component or not.Although the technique can extract candidates of reused components, a user still has to manually identify the original components among the candidates.In this paper, we propose a method to automatically select the most likely origin of components reused in a product, based on an assumption that a product tends to include an entire copy of a component rather than a partial copy.More concretely, given a Java product and a repository of jar files of existing components, our method selects jar files that can provide Java classes to the product in a greedy manner.To compare the method with the existing technique, we have conducted an evaluation using randomly created jar files including up to 1,000 components.The Software Bertillonage technique reports many candidates; the precision and recall are 0.357 and 0.993, respectively.Our method reports a list of original components whose precision and recall are 0.998 and 0.997.

[1]  Gabriele Bavota,et al.  The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache , 2013, 2013 IEEE International Conference on Software Maintenance.

[2]  Elmar Jürgens,et al.  Incremental origin analysis of source code files , 2014, MSR 2014.

[3]  Daniel M. Germán,et al.  Identifying licensing of jar archives using a code-search approach , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[4]  Xavier Blanc,et al.  A study of library migrations in Java , 2014, J. Softw. Evol. Process..

[5]  Peng Liu,et al.  Achieving accuracy and scalability simultaneously in detecting application clones on Android markets , 2014, ICSE.

[6]  Karl Trygve Kalleberg,et al.  Finding software license violations through binary code clone detection , 2011, MSR '11.

[7]  Reidar Conradi,et al.  An empirical study of software reuse vs. defect-density and stability , 2004, Proceedings. 26th International Conference on Software Engineering.

[8]  Joachim Henkel,et al.  License risks from ad hoc reuse of code from the internet , 2011, Commun. ACM.

[9]  Katsuro Inoue,et al.  Where does this code come from and where does it go? — Integrated code history tracker for open source systems , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[10]  Ahmed E. Hassan,et al.  A Large-Scale Empirical Study on Software Reuse in Mobile Apps , 2014, IEEE Software.

[11]  Katsuro Inoue,et al.  Extraction of product evolution tree from source code of product variants , 2013, SPLC '13.

[12]  Marsha Chechik,et al.  Managing cloned variants: a framework and experience , 2013, SPLC '13.

[13]  Arie van Deursen,et al.  Semantic Versioning versus Breaking Changes: A Study of the Maven Repository , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[14]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[15]  M. Godfrey,et al.  Bertillonage Determining the provenance of software development artifacts , 2011 .

[16]  Katsuro Inoue,et al.  VerXCombo: An Interactive Data Visualization of Popular Library Version Combinations , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[17]  Alexander Egyed,et al.  Enhancing Clone-and-Own with Systematic Reuse for Developing Software Variants , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[18]  Jens Knodel,et al.  Analyzing the Source Code of Multiple Software Variants for Reuse Potential , 2011, 2011 18th Working Conference on Reverse Engineering.

[19]  Sencun Zhu,et al.  Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection , 2014, SIGSOFT FSE.

[20]  Benedikt Hauptmann,et al.  Assessing cross-project clones for reuse optimization , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[21]  Mario Gleirscher,et al.  On the Extent and Nature of Software Reuse in Open Source Java Projects , 2011, ICSR.

[22]  Daniel M. Germán,et al.  A Method for Open Source License Compliance of Java Applications , 2012, IEEE Software.

[23]  Daniel J. Quinlan,et al.  Detecting code clones in binary executables , 2009, ISSTA.

[24]  David Lo,et al.  Automated library recommendation , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[25]  Brian Hutchison,et al.  Getting started. , 2019, Healthcare policy = Politiques de sante.

[26]  Katsuro Inoue,et al.  Identifying Source Code Reuse across Repositories Using LCS-Based Source Code Similarity , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[27]  Daniel M. Germán,et al.  Code siblings: Technical and legal implications of copying code between applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[28]  Xiaohong Su,et al.  Library functions identification in binary code by using graph isomorphism testings , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[29]  Jens Dietrich,et al.  Broken promises: An empirical study into evolution problems in Java programs caused by library upgrades , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[30]  Michael W. Godfrey,et al.  Software bertillonage: finding the provenance of an entity , 2011, MSR '11.

[31]  Katsuro Inoue,et al.  Visualizing the Evolution of Systems and Their Library Dependencies , 2014, 2014 Second IEEE Working Conference on Software Visualization.

[32]  Michael W. Godfrey,et al.  Software Bertillonage , 2012, Empirical Software Engineering.

[33]  Rainer Koschke,et al.  Reverse Engineering Variability in Source Code Using Clone Detection: A Case Study for Linux Variants of Consumer Electronic Devices , 2012, 2012 19th Working Conference on Reverse Engineering.