论文信息 - A memory efficient graph kernel

A memory efficient graph kernel

In this paper, we show how learning models generated by a recently introduced state-of-the-art kernel for graphs can be optimized from the point of view of memory occupancy. After a brief description of the kernel, we introduce a novel representation of the explicit feature space of the kernel based on an hash function which allows to reduce the amount of memory needed both during the training phase and to represent the final learned model. Subsequently, we study the application of a feature selection strategy based on the F-score to further reduce the number of features in the final model. On two representative datasets involving binary classification of chemical graphs, we show that it is actually possible to sensibly reduce memory occupancy (up to one order of magnitude) for the final model with a moderate loss in classification performance.

Alessandro Sperduti | Giovanni Da San Martino | Nicolò Navarin

[1] Fabrizio Costa,et al. Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[2] Thomas Gärtner,et al. On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[3] Alessandro Sperduti,et al. A Tree-Based Kernel for Graphs , 2012, SDM.

[4] Hans-Peter Kriegel,et al. Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5] Sebastian Nowozin,et al. gBoost: a mathematical programming approach to graph classification and regression , 2009, Machine Learning.

[6] Jean-Philippe Vert,et al. Graph kernels based on tree patterns for molecules , 2006, Machine Learning.

[7] Alexander J. Smola,et al. Fast Kernels for String and Tree Matching , 2002, NIPS.

[8] David Haussler,et al. Convolution kernels on discrete structures , 1999 .

[9] Karsten M. Borgwardt,et al. Fast subtree kernels on graphs , 2009, NIPS.

[10] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[11] M. Boyd,et al. New soluble-formazan assay for HIV-1 cytopathic effects: application to high-flux screening of synthetic and natural products for AIDS-antiviral activity. , 1989, Journal of the National Cancer Institute.

[12] Luc De Raedt,et al. Maximum common subgraph mining: A fast and effective approach towards feature generation , 2009, MLG 2009.

[13] Chih-Jen Lin,et al. Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.