Graph Classification via Topological and Label Attributes

Graph classification is an important data mining task, and various graph kernel methods have been proposed recently for this task. These methods have proven to be effective, but they tend to have high computational overhead. In this paper, we propose an alternative approach to graph classification that is based on feature-vectors constructed from different global topological attributes, as well as global label features. The main idea here is that the graphs from the same class should have similar topological and label attributes. Our method is simple and easy to implement, and via a detailed comparison on real benchmark datasets, we show that our topological and label feature-based approach delivers better or competitive classification accuracy, and is also substantially faster than other graph kernels. It is the most effective method for large unlabeled graphs.

[1]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[2]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[3]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[6]  Cigdem Demir,et al.  Learning the Topological Properties of Brain Tumors , 2005, TCBB.

[7]  Bülent Yener,et al.  ECM-aware cell-graph mining for bone tissue modeling and classification , 2010, Data Mining and Knowledge Discovery.

[8]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[9]  Philip S. Yu,et al.  Discriminative frequent subgraph mining with optimality guarantees , 2010, Stat. Anal. Data Min..

[10]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[12]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[13]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[14]  Jean-Philippe Vert,et al.  Graph kernels based on tree patterns for molecules , 2006, Machine Learning.

[15]  Thomas Gärtner,et al.  Cyclic pattern kernels for predictive graph mining , 2004, KDD.

[16]  B. Yener,et al.  Cell-Graph Mining for Breast Tissue Modeling and Classification , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[17]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[18]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[19]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[20]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[21]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[22]  S. V. N. Vishwanathan,et al.  Fast Computation of Graph Kernels , 2006, NIPS.

[23]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[24]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[25]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[26]  Jan Ramon,et al.  Expressivity versus efficiency of graph kernels , 2003 .

[27]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[28]  Thomas Hofmann,et al.  Predicting structured objects with support vector machines , 2009, Commun. ACM.

[29]  Karsten M. Borgwardt,et al.  The graphlet spectrum , 2009, ICML '09.

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.