Source Codes Classification Using a Modified Instruction Count Pass

The vulnerability is a flaw in the system’s implementation which may result in severe consequences. The existence of these flaws should be detected and managed. There are several types of research which provide different solutions to detect these flaws through static analysis of the original source codes. Static analysis process has many disadvantages, some of them are; slower than compilation and produce high false positive rate. In this project, we introduce a prediction technique using the output of one of the LLVM passes; “InstCount”. A classifier was built based on the output of this pass on 500 source codes written in C and C++ languages with 88% of accuracy. A comparison between our classifier and Clang static analyzer showed that the classifier super performed to predict the existence of memory leak and Null pointers. The experiment also showed that this classifier could be applied or integrated with static analysis tools for more efficient results.

[1]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[2]  Lerina Aversano,et al.  Learning from bug-introducing changes to prevent fault prone code , 2007, IWPSE '07.

[3]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[4]  Laurie A. Williams,et al.  Is complexity really the enemy of software security? , 2008, QoP '08.

[5]  Osamu Mizuno,et al.  Spam Filter Based Approach for Finding Fault-Prone Software Modules , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[6]  Gary McGraw,et al.  ITS4: a static vulnerability scanner for C and C++ code , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[7]  George Cybenko,et al.  Three tenets for secure cyber-physical system design and assessment , 2014, Defense + Security Symposium.