The limits of automatic OS fingerprint generation

Remote operating system fingerprinting relies on implementation differences between OSs to identify the specific variant executing on a remote host. Because these differences can be subtle and difficult to find, most fingerprinting tools require expert manual effort to construct discriminative fingerprints and classification models. In prior work, Caballero et al. proposed a promising technique to eliminate manual intervention: the automatic generation of fingerprints using an approach similar to fuzz testing [6]. Their work evaluated the technique in a small-scale, carefully controlled test environment. In this paper, we re-examine automatic OS fingerprinting in a more challenging large-scale scenario to better understand the viability of the technique. In contrast to the prior work, we find that automatic fingerprint generation suffers from several limitations and technical hurdles that can limit its effectiveness, particularly in more demanding, realistic environments. We use machine learning algorithms from the well-known Weka [11] data mining toolkit to automatically generate fingerprints over 329 different machine instances, and we compare the accuracy of our automatically generated fingerprints to Nmap. Our results suggest that overfitting to non-OS-specific behavioral differences, the indistinguishability of different OS variants, the biasing of an automatic tool to the makeup of the training data, and the lack of ability of an automatic tool to exploit protocol and software semantics significantly limit the usefulness of this technique in practice. Automatic techniques can help identify candidate signatures, but our results suggest that manual expertise will remain an integral part of fingerprint generation.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Douglas Comer,et al.  Probing TCP Implementations , 1994, USENIX Summer.

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[5]  Vern Paxson,et al.  Automated packet trace analysis of TCP implementations , 1997, SIGCOMM '97.

[6]  PaxsonVern Automated packet trace analysis of TCP implementations , 1997 .

[7]  Sally Floyd,et al.  On inferring TCP behavior , 2001, SIGCOMM.

[8]  Craig Smith,et al.  Know Your Enemy : Passive Fingerprinting , 2001 .

[9]  Ofir Arkin,et al.  The Present and Future of Xprobe2 The Next Generation of Active Operating System Fingerprinting , 2003 .

[10]  R. Lippmann,et al.  Passive Operating System Identification From TCP / IP Packet Headers * , 2003 .

[11]  Ryan Spangler,et al.  Analysis of Remote Active Operating System Fingerprinting Tools , 2003 .

[12]  Niels Provos,et al.  A Virtual Honeypot Framework , 2004, USENIX Security Symposium.

[13]  Robert Beverly,et al.  A Robust Classifier for Passive TCP/IP Fingerprinting , 2004, PAM.

[14]  Greg Taleck SYNSCAN : Towards Complete TCP / IP Fingerprinting , 2004 .

[15]  T. Kohno,et al.  Remote physical device fingerprinting , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[16]  Damon McCoy,et al.  Passive Data Link Layer 802.11 Wireless Device Driver Fingerprinting , 2006, USENIX Security Symposium.

[17]  François Gagnon,et al.  A Hybrid Approach to Operating System Discovery using Answer Set Programming , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[18]  Dawn Xiaodong Song,et al.  Fig: Automatic Fingerprint Generation , 2007, NDSS.

[19]  Lloyd G. Greenwald,et al.  Toward Undetected Operating System Fingerprinting , 2007, WOOT.

[20]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.