Results of the DARPA 1998 Offline Intrusion Detection Evaluation

DARPA sponsored the first realistic and systematic evaluation of research intrusion detection systems in 1998. As part of this evaluation, MIT Lincoln Laboratory developed a test network which simulated a medium-size government site. Background traffic was generated over two months using custom traffic generators which looked like 100’s of users on 1000’s of hosts performing a wide variety of tasks and generating a rich mixture of network traffic. While this background traffic was being generated, automated attacks were launched against three UNIX victim machines (SunOS, Solaris, Linux) located on the inside of this simulated government site behind a router. More than 300 instances of 38 different attacks were embedded in roughly two months of training data and two weeks of test data. Six DARPA research sites participated in a blind evaluation where test data was provided without specifying the location of embedded attacks. Results were analyzed by generating receiver operating characteristic curves (ROCs) to determine the attack detection rate as a function of the false alarm rate. Performance was evaluated for old attacks included in the training data, new attacks which only occurred in the test data, and novel new never-before-seen attacks developed specifically for this evaluation. Detection performance for the best systems was reasonable (above 60% correct) at a false alarm rate of 10 false alarms per day for both old and new probe attacks and attacks where a local user illegally becomes root (u2r). Intrusion detection systems trained on old probe or u2r attacks generalized well to other attacks in these same categories. Detection rates were worse, especially for new and novel denial of service (dos) attacks and attacks where a remote user illegally accesses a local host (r2l). Although detection accuracy for old attacks in these two categories was roughly 80%, detection accuracy for new and novel attacks was below 25% even at high false alarm rates. An intrusion detection system formed from the best components of the submitted systems performed much better than a baseline keyword spotting system that is similar to many commercial and government systems. Across all 120 attacks in the test data, it reduced the false alarm rate by more than two orders of magnitude (from roughly 600 false alarms per day to 6) and it also increased the detection accuracy (from roughly 20% detections to 60%). These results suggest that future intrusion detection research should move towards developing algorithms that find new attacks and away from older approaches that focus on creating rules to find attack signatures. The current 1999 DARPA evaluation is extending the 1998 evaluation by adding Windows/NT victims, including new Windows/NT attacks, including insider attacks,