A Comparison of Word Frequency and N-Gram Based Vulnerability Categorization Using SOM

Network attackers exploit software vulnerabilities on network computers to facilitate successful attacks. Many organizations keep track of the existing software vulnerabilities in the form of vulnerability databases. However, categorizing vulnerabilities is difficult due to the large number of different attributes maintained. In this work we apply a dataclustering algorithm (SOM) to two different representations of information contained in an existing online vulnerability databases. After identifying the more valuable approach for this task, we are able to identify critical vulnerability features inherent in the dataset.