Finding the longest similar subsequence of thumbprints for intrusion detection

One way to detect intruders on the Internet is to compare the similarity of two thumbprints. A thumbprint is a summary of a connection that characterizes the connection. The packet gap thumbprint consists of sequences of non-negative real number representing the time gaps between "send" packets. This paper formalized definitions of similarity between two non-negative real number sequences, by introducing /spl epsiv/-similarity, partial sum and longest /spl epsiv/-similar subsequence (LSS). Length of LSS is a measurement of similarity between two sequences. The longest /spl epsiv/-similar subsequence (LSS) problem is a generalization of the well known longest common subsequence (LCS) problem. The goal of this paper is to find an optimal solution to the LSS problem. We analyzed the property of partial sums and proposed to focus on the minimum matched partial sum which leads to an optimal solution to LSS while reduce the problem space. As the LSS problem has optimal structure, we proposed an algorithm based on dynamic programming technique. Time complexity of this algorithm is O(m/sup 2/n/sup 2/). By using a property of the partial sums, we reduced the time complexity to O(mn(m+n)).

[1]  D. Knuth,et al.  Selected combinatorial research problems. , 1972 .

[2]  Yin Zhang,et al.  Detecting Stepping Stones , 2000, USENIX Security Symposium.

[3]  David S. L. Wei,et al.  Computer Algorithms , 1998, Scalable Comput. Pract. Exp..

[4]  Shou-Hsuan Stephen Huang,et al.  Matching TCP packets and its application to the detection of long connection chains on the Internet , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[5]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[6]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..