Virus detection with machine learning

Standard virus detection relies on the use of signatures, which are a small number of bytes from a virus. In this project we present an alternate approach to virus detection through the use of machine learning techniques, to detect viruses based on their behaviour instead. A virtual environment was created to allow real viruses to propagate. In order to obtain realistic results, we attempted to simulate real-world computer usage on the virtual machines we used. A virus ‘observatory’ was then designed to visualise virus propagation. Windows perfmon counters were used to obtain a numeric representation of the computer’s state. These counters from machines, were then visualised in a matrix format. We applied machine learning techniques to detect the name of the virus active on a computer. We consider how to improve the accuracy of the classifiers through the use of information gain. Finally we attempted to detect viruses after our system was only trained using ‘normal’ activity. This gives the system the possibility for being used to detect unknown viruses.