Sequence-similarity kernels for SVMs to detect anomalies in system calls

In intrusion detection systems (IDSs), short sequences of system calls executed by running programs can be used as evidence to detect anomalies. In this paper, one-class support vector machines (SVMs) using sequence-similarity kernels are adopted as the anomaly detectors. Edit distance-based kernel and common subsequence-based kernel are proposed to utilize the sequence information in the detection. Algorithms for efficient computation of the kernels are derived with the techniques of dynamic programming and bit-parallelism. The experimental results indicate that the proposed kernels can significantly outperform the standard RBF kernel.

[1]  Jason Weston,et al.  Mismatch String Kernels for SVM Protein Classification , 2002, NIPS.

[2]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[3]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[4]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[5]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[6]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[7]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[8]  Juho Rousu,et al.  Efficient Computation of Gapped Substring Kernels on Large Alphabets , 2005, J. Mach. Learn. Res..

[9]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[10]  Chitra Dorai,et al.  New kernels for analyzing multimodal data in multimedia using kernel machines , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[11]  Mehryar Mohri,et al.  Positive Definite Rational Kernels , 2003, COLT.

[12]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[15]  Gonzalo Navarro,et al.  Fast multipattern search algorithms for intrusion detection , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[16]  Y. Freund,et al.  Profile-based string kernels for remote homology detection and motif extraction. , 2005, Journal of bioinformatics and computational biology.

[17]  Robert P. W. Duin,et al.  Data domain description using support vectors , 1999, ESANN.

[18]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[19]  Eugene W. Myers A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming , 1998, CPM.

[20]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[21]  Colin Campbell,et al.  Kernel methods: a survey of current techniques , 2002, Neurocomputing.

[22]  Andrew H. Sung,et al.  Intrusion detection using neural networks and support vector machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).