Micro-signatures: The Effectiveness of Known Bad N-Grams for Network Anomaly Detection

Network intrusion detection is broadly divided into signature and anomaly detection. The former identifies patterns associated with known attacks and the latter attempts to learn a ‘normal’ pattern of activity and alerts when behaviors outside of those norms is detected. The n-gram methodology has arguably been the most successful technique for network anomaly detection. In this work we discover that when training data is sanitized, n-gram anomaly detection is not primarily anomaly detection, as it receives the majority of its performance from an implicit non-anomaly subsystem, that neither uses typical signatures nor is anomaly based (though it is closely related to both). We find that for our data, these “micro-signatures” provide the vast majority of the detection capability. This finding changes how we understand and approach n-gram based ‘anomaly’ detection. By understanding the foundational principles upon which it operates, we can then better explore how to optimally improve it.

[1]  Salvatore J. Stolfo,et al.  Anagram: A Content Anomaly Detector Resistant to Mimicry Attack , 2006, RAID.

[2]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[3]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[4]  Stephanie Forrest,et al.  Learning DFA representations of HTTP for protecting web applications , 2007, Comput. Networks.

[5]  Christopher Krügel,et al.  Using Generalization and Characterization Techniques in the Anomaly-based Detection of Web Attacks , 2006, NDSS.

[6]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[7]  Marius Kloft,et al.  Active learning for network intrusion detection , 2009, AISec '09.

[8]  Konrad Rieck,et al.  Detecting Unknown Network Attacks Using Language Models , 2006, DIMVA.

[9]  S. E. Smaha Haystack: an intrusion detection system , 1988, [Proceedings 1988] Fourth Aerospace Computer Security Applications.

[10]  Stephanie Forrest,et al.  Infect Recognize Destroy , 1996 .

[11]  Liang Guangmin Modeling Unknown Web Attacks in Network Anomaly Detection , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[12]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[13]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[14]  Stefan Axelsson,et al.  Intrusion Detection Systems: A Survey and Taxonomy , 2002 .

[15]  Wenke Lee,et al.  McPAD: A multiple classifier system for accurate payload-based anomaly detection , 2009, Comput. Networks.

[16]  Klaus-Robert Müller,et al.  Efficient Algorithms for Similarity Measures over Sequential Data: A Look Beyond Kernels , 2006, DAGM-Symposium.

[17]  Pieter H. Hartel,et al.  Poseidon: a 2-tier Anomaly-based Intrusion Detection System , 2005, ArXiv.

[18]  Gunar E. Liepins,et al.  Detection of anomalous computer session activity , 1989, Proceedings. 1989 IEEE Symposium on Security and Privacy.

[19]  Sandro Etalle,et al.  N-Gram against the Machine: On the Feasibility of the N-Gram Network Analysis for Binary Protocols , 2012, RAID.

[20]  Konrad Rieck,et al.  A close look on n-grams in intrusion detection: anomaly detection vs. classification , 2013, AISec.

[21]  Stefan Axelsson,et al.  The base-rate fallacy and the difficulty of intrusion detection , 2000, TSEC.

[22]  Salvatore J. Stolfo,et al.  Adaptive Anomaly Detection via Self-calibration and Dynamic Updating , 2009, RAID.

[23]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[24]  Richard Harang,et al.  Extremely Lightweight Intrusion Detection (ELIDe) , 2013 .