Automatic Classification of Vulnerabilities using Deep Learning and Machine Learning Algorithms

As the field of computer science has advanced over the years, there has been a tremendous increase in the software being created, and this increase has been accompanied by an increase in the number of software vulnerabilities. A software vulnerability is a security flaw found in software that can potentially be exploited by attackers to perform cyber attacks. Since automatic approaches for identifying and analyzing vulnerabilities have become a trending topic in research, community, the classification of vulnerability is still an open issue. Developers need to know more about characteristics and types of vulnerabilities in systems to adopt suitable countermeasures in current and next versions. With this paper, we investigate whether vulnerability descriptions alone can be used to identify the type of vulnerability, by comparing five shallow learning models and fourteen deep learning models. The model with the highest F1-score was the Stacking-DNN (98.8%). On performing comprehensive analysis, the experiments demonstrate that both shallow and deep classifiers show comparable performance when trained and tested using the dataset without duplicates, while shallow classifiers showed better performance when trained and tested using the dataset with duplicates.