论文信息 - Approximate Kernels for Trees

Approximate Kernels for Trees

Convolution kernels for trees provide effective means for learning with treestructured data, such as parse trees of natural language sentences. Unfortunately, the computation time of tree kernels is quadratic in the size of the trees as all pairs of nodes need to be compared: large trees render convolution kernels inapplicable. In this paper, we propose a simple but efficient approximation technique for tree kernels. The approximate tree kernel (ATK) accelerates computation by selecting a sparse and discriminative subset of subtrees using a linear program. The kernel allows for incorporating domain knowledge and controlling the overall computation time through additional constraints. Experiments on applications of natural language processing and web spam detection demonstrate the efficiency of the approximate kernels. We observe run-time improvements of two orders of magnitude while preserving the discriminative expressiveness and classification rates of regular convolution kernels.

[1] Alexander J. Smola,et al. Learning with kernels , 1998 .

[2] David Haussler,et al. Convolution kernels on discrete structures , 1999 .

[3] C. C. Chang,et al. Libsvm : introduction and benchmarks , 2000 .

[4] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[5] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.

[6] Michael Collins,et al. Convolution Kernels for Natural Language , 2001, NIPS.

[7] Dan Roth,et al. Learning Question Classifiers , 2002, COLING.

[8] Hisashi Kashima,et al. Kernels for Semi-Structured Data , 2002, ICML.

[9] Alexander J. Smola,et al. Fast Kernels for String and Tree Matching , 2002, NIPS.

[10] Dell Zhang,et al. Question classification using support vector machines , 2003, SIGIR.

[11] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2004 .

[12] Jun Suzuki,et al. Sequence and Tree Kernels with Statistical Feature Mining , 2005, NIPS.

[13] H. Kashima,et al. Flexible Tree Kernels based on Counting the Number of Tree Mappings , 2006 .

[14] Luca Becchetti,et al. A reference collection for web spam , 2006, SIGF.

[15] Alessandro Moschitti,et al. Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[16] Alessandro Moschitti,et al. Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[17] Alessandro Moschitti,et al. Advanced Tree-Based Kernels for Protein Classification , 2007, AI*IA.