Approximate Tree Kernels

Convolution kernels for trees provide simple means for learning with tree-structured data. The computation time of tree kernels is quadratic in the size of the trees, since all pairs of nodes need to be compared. Thus, large parse trees, obtained from HTML documents or structured network data, render convolution kernels inapplicable. In this article, we propose an effective approximation technique for parse tree kernels. The approximate tree kernels (ATKs) limit kernel computation to a sparse subset of relevant subtrees and discard redundant structures, such that training and testing of kernel-based learning methods are significantly accelerated. We devise linear programming approaches for identifying such subsets for supervised and unsupervised learning tasks, respectively. Empirically, the approximate tree kernels attain run-time improvements up to three orders of magnitude while preserving the predictive accuracy of regular tree kernels. For unsupervised tasks, the approximate tree kernels even lead to more accurate predictions by identifying relevant dimensions in feature space.

[1]  Alessandro Moschitti,et al.  Advanced Tree-Based Kernels for Protein Classification , 2007, AI*IA.

[2]  Ulf Brefeld,et al.  Approximate Kernels for Trees , 2008 .

[3]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[4]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[5]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[6]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[7]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[8]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.1 , 1997, RFC.

[9]  Tobias Scheffer,et al.  Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam , 2005, ECML.

[10]  Luca Becchetti,et al.  A reference collection for web spam , 2006, SIGF.

[11]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[12]  Konrad Rieck,et al.  Incorporation of Application Layer Protocol Syntax into Anomaly Detection , 2008, ICISS.

[13]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[14]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[15]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[16]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[17]  Konrad Rieck,et al.  Language models for detection of unknown attacks in network traffic , 2006, Journal in Computer Virology.

[18]  Jun Suzuki,et al.  Convolution Kernels with Feature Selection for Natural Language Processing Tasks , 2004, ACL.

[19]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[20]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[21]  Brian D. Davison,et al.  Identifying link farm spam pages , 2005, WWW '05.

[22]  C. C. Chang,et al.  Libsvm : introduction and benchmarks , 2000 .

[23]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[26]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[27]  Larry L. Peterson,et al.  binpac: a yacc for writing application protocol parsers , 2006, IMC '06.

[28]  Michael Meier,et al.  Learning SQL for Database Intrusion Detection Using Context-Sensitive Modelling (Extended Abstract) , 2009, DIMVA.

[29]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[30]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[31]  Michael Meier,et al.  Learning SQL for Database Intrusion Detection using Context-Sensitive Modelling , 2009, LWA.

[32]  Christopher Krügel,et al.  Anomaly detection of web-based attacks , 2003, CCS '03.

[33]  Vern Paxson,et al.  A high-level programming environment for packet trace anonymization and transformation , 2003, SIGCOMM '03.

[34]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[35]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[36]  Klaus-Robert Müller,et al.  Machine Learning for Intrusion Detection , 2007, NATO ASI Mining Massive Data Sets for Security.

[37]  Alessandro Moschitti,et al.  Fast and effective kernels for relational learning from texts , 2007, ICML '07.

[38]  Jun Suzuki,et al.  Sequence and Tree Kernels with Statistical Feature Mining , 2005, NIPS.

[39]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[40]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[41]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[42]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[43]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[44]  Helen J. Wang,et al.  Generic Application-Level Protocol Analyzer and its Language , 2007, NDSS.

[45]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[46]  Abhay K. Bhushan,et al.  The File Transfer Protocol , 1971, Request for Comments.

[47]  Joachim M. Buhmann,et al.  On Relevant Dimensions in Kernel Feature Spaces , 2008, J. Mach. Learn. Res..