Fig: Automatic Fingerprint Generation

Fingerprinting is a widely used technique among the networking and security communities for identifying different implementations of the same piece of networking software running on a remote host. A fingerprint is essentially a set of queries and a classification function that can be applied on the responses to the queries in order to classify the software into classes. So far, identifying fingerprints remains largely an arduous and manual process. This paper proposes a novel approach for automatic fingerprint generation, that automatically explores a set of candidate queries and applies machine learning techniques to identify the set of valid queries and to learn an adequate classification function. Our results show that such an automatic process can generate accurate fingerprints that classify each piece of software into its proper class and that the search space for query exploration remains largely unexploited, with many new such queries awaiting discovery. With a preliminary exploration, we are able to identify new queries not previously used for fingerprinting.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Douglas Comer,et al.  Probing TCP Implementations , 1994, USENIX Summer.

[3]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[4]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[5]  Vern Paxson,et al.  Automated packet trace analysis of TCP implementations , 1997, SIGCOMM '97.

[6]  Sally Floyd,et al.  Identifying the tcp behavior of web servers , 2000, SIGCOMM 2000.

[7]  David Watson,et al.  Transport and application protocol scrubbing , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[8]  Farnam Jahanian,et al.  Defeating TCP/IP Stack Fingerprinting , 2000, USENIX Security Symposium.

[9]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[10]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[11]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[12]  R. Lippmann,et al.  Passive Operating System Identification From TCP / IP Packet Headers * , 2003 .

[13]  Robert Beverly,et al.  A Robust Classifier for Passive TCP/IP Fingerprinting , 2004, PAM.

[14]  T. Kohno,et al.  Remote physical device fingerprinting , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[15]  Bogdan M. Wilamowski,et al.  The Transmission Control Protocol , 2005, The Industrial Information Technology Handbook.

[16]  Damon McCoy,et al.  Passive Data Link Layer 802.11 Wireless Device Driver Fingerprinting , 2006, USENIX Security Symposium.