An Empirical Study of Secret Security Patch in Open Source Software

Security patches of Open Source Software (OSS) point out the vulnerable source code and provide security fixes, which can be misused by attackers to generate exploits as N-day attacks. Though the best practice for defending this type of N-day attacks is to timely patch the software, it becomes a challenge considering that a system may bundle multiple OSS with a large number of patches including security fixes, bug fixes, and new features. Even worse, software vendors may secretly patch their vulnerabilities without reporting to CVE or providing any explicit descriptions in change logs. Hence, armored attackers may compromise not only unpatched versions of the same software, but also other software with similar functionalities due to code clone or similar logic. We consider it as one type of “0-day” vulnerability. Since those secret security patches should be correctly identified and fixed with high priority, we develop a machine learning based toolset to help distinguish security patches from non-security patches. We then conduct an empirical analysis on three popular open source SSL libraries to study the existence of security patches. Our experimental results suggest that a joint effort is needed to eliminate this type of “0-day” attacks introduced by secret patches.

[1]  Shouhuai Xu,et al.  VulPecker: an automated vulnerability detection system based on code similarity analysis , 2016, ACSAC.

[2]  Katsuro Inoue,et al.  Do developers update their library dependencies? , 2017, Empirical Software Engineering.

[3]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[4]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[5]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[8]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[11]  Ahmed E. Hassan,et al.  Security versus performance bugs: a case study on Firefox , 2011, MSR '11.

[12]  Yang Liu,et al.  SPAIN: Security Patch Analysis for Binaries towards Understanding the Pain and Pills , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[13]  Gang Wang,et al.  Understanding the Reproducibility of Crowd-reported Security Vulnerabilities , 2018, USENIX Security Symposium.

[14]  Heejo Lee,et al.  VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[15]  Vern Paxson,et al.  A Large-Scale Empirical Study of Security Patches , 2017, CCS.

[16]  David Lo,et al.  Identifying Linux bug fixing patches , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[17]  Matthew Smith,et al.  VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits , 2015, CCS.

[18]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[19]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).