Detecting "0-Day" Vulnerability: An Empirical Study of Secret Security Patch in OSS

Security patches in open source software (OSS) not only provide security fixes to identified vulnerabilities, but also make the vulnerable code public to the attackers. Therefore, armored attackers may misuse this information to launch N-day attacks on unpatched OSS versions. The best practice for preventing this type of N-day attacks is to keep upgrading the software to the latest version in no time. However, due to the concerns on reputation and easy software development management, software vendors may choose to secretly patch their vulnerabilities in a new version without reporting them to CVE or even providing any explicit description in their change logs. When those secretly patched vulnerabilities are being identified by armored attackers, they can be turned into powerful "0-day" attacks, which can be exploited to compromise not only unpatched version of the same software, but also similar types of OSS (e.g., SSL libraries) that may contain the same vulnerability due to code clone or similar design/implementation logic. Therefore, it is critical to identify secret security patches and downgrade the risk of those "0-day" attacks to at least "n-day" attacks. In this paper, we develop a defense system and implement a toolset to automatically identify secret security patches in open source software. To distinguish security patches from other patches, we first build a security patch database that contains more than 4700 security patches mapping to the records in CVE list. Next, we identify a set of features to help distinguish security patches from non-security ones using machine learning approaches. Finally, we use code clone identification mechanisms to discover similar patches or vulnerabilities in similar types of OSS. The experimental results show our approach can achieve good detection performance. A case study on OpenSSL, LibreSSL, and BoringSSL discovers 12 secret security patches.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[3]  Heejo Lee,et al.  VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[4]  Katsuro Inoue,et al.  Do developers update their library dependencies? , 2017, Empirical Software Engineering.

[5]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[6]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[7]  Vern Paxson,et al.  A Large-Scale Empirical Study of Security Patches , 2017, CCS.

[8]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[9]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[10]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[11]  Shouhuai Xu,et al.  VulPecker: an automated vulnerability detection system based on code similarity analysis , 2016, ACSAC.

[12]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[13]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[14]  Konrad Rieck,et al.  Modeling and Discovering Vulnerabilities with Code Property Graphs , 2014, 2014 IEEE Symposium on Security and Privacy.

[15]  David Lo,et al.  Identifying Linux bug fixing patches , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[16]  Yang Liu,et al.  SPAIN: Security Patch Analysis for Binaries towards Understanding the Pain and Pills , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[17]  Gang Wang,et al.  Understanding the Reproducibility of Crowd-reported Security Vulnerabilities , 2018, USENIX Security Symposium.

[18]  Ahmed E. Hassan,et al.  Security versus performance bugs: a case study on Firefox , 2011, MSR '11.

[19]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[20]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  Matthew Smith,et al.  VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits , 2015, CCS.