$K$ -Ary Tree Hashing for Fast Graph Classification

Existing graph classification usually relies on an exhaustive enumeration of substructure patterns, where the number of substructures expands exponentially w.r.t. with the size of the graph set. Recently, the Weisfeiler-Lehman (WL) graph kernel has achieved the best performance in terms of both accuracy and efficiency among state-of-the-art methods. However, it is still time-consuming, especially for large-scale graph classification tasks. In this paper, we present a -Ary Tree based Hashing (KATH) algorithm, which is able to obtain competitive accuracy with a very fast runtime. The main idea of KATH is to construct a traversal table to quickly approximate the subtree patterns in WL using <inline-formula><tex-math notation="LaTeX">$K$</tex-math></inline-formula>-ary trees. Based on the traversal table, KATH employs a recursive indexing process that performs only <inline-formula><tex-math notation="LaTeX">$r$ </tex-math><alternatives><inline-graphic xlink:href="wu-ieq3-2782278.gif"/></alternatives></inline-formula> times of matrix indexing to generate all <inline-formula><tex-math notation="LaTeX">$(r-1)$</tex-math><alternatives> <inline-graphic xlink:href="wu-ieq4-2782278.gif"/></alternatives></inline-formula>-depth <inline-formula> <tex-math notation="LaTeX">$K$</tex-math><alternatives><inline-graphic xlink:href="wu-ieq5-2782278.gif"/></alternatives> </inline-formula>-ary trees, where the leaf node labels of a tree can uniquely specify the pattern. After that, the MinHash scheme is used to fingerprint the acquired subtree patterns for a graph. Our experimental results on both real world and synthetic data sets show that KATH runs significantly faster than state-of-the-art methods while achieving competitive or better accuracy.

[1]  Ping Li,et al.  0-Bit Consistent Weighted Sampling , 2015, KDD.

[2]  Zi Huang,et al.  Robust Hashing With Local Models for Approximate Similarity Search , 2014, IEEE Transactions on Cybernetics.

[3]  Chengqi Zhang,et al.  Consistent Weighted Sampling Made More Practical , 2017, WWW.

[4]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[6]  Ping Li,et al.  One Permutation Hashing , 2012, NIPS.

[7]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[8]  Kaizhu Huang,et al.  Learning Locality Preserving Graph from Data , 2014, IEEE Transactions on Cybernetics.

[9]  Wagner Meira,et al.  Min-Hash Fingerprints for Graph Kernels: A Trade-off among Accuracy, Efficiency, and Compression , 2012, J. Inf. Data Manag..

[10]  Deng Cai,et al.  Density Sensitive Hashing , 2012, IEEE Transactions on Cybernetics.

[11]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[12]  Bin Li,et al.  POISketch: Semantic Place Labeling over User Activity Streams , 2016, IJCAI.

[13]  Ping Li,et al.  In Defense of Minhash over Simhash , 2014, AISTATS.

[14]  Ping Li,et al.  Hashing Algorithms for Large-Scale Learning , 2011, NIPS.

[15]  Chengqi Zhang,et al.  Nested Subtree Hash Kernels for Large-Scale Graph Classification over Streams , 2012, 2012 IEEE 12th International Conference on Data Mining.

[16]  Sreenivas Gollapudi,et al.  The power of two min-hashes for similarity search among hierarchical data objects , 2008, PODS.

[17]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[18]  Chengqi Zhang,et al.  Task Sensitive Feature Exploration and Learning for Multitask Graph Classification , 2017, IEEE Transactions on Cybernetics.

[19]  Philip S. Yu,et al.  Graph stream classification using labeled and unlabeled graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[20]  Yao Hu,et al.  Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion , 2014, IEEE Transactions on Cybernetics.

[21]  Hisashi Kashima,et al.  A Linear-Time Graph Kernel , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[22]  Jean-Philippe Vert,et al.  Graph kernels based on tree patterns for molecules , 2006, Machine Learning.

[23]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[24]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[25]  Bin Li,et al.  Fast Graph Stream Classification Using Discriminative Clique Hashing , 2013, PAKDD.

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Bin Li,et al.  HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[28]  Chengqi Zhang,et al.  Canonical Consistent Weighted Sampling for Real-Value Weighted Min-Hash , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[29]  Charu C. Aggarwal,et al.  On Classification of Graph Streams , 2011, SDM.

[30]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[31]  Zhihua Cai,et al.  Boosting for Multi-Graph Classification , 2015, IEEE Transactions on Cybernetics.

[32]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[33]  Kaspar Riesen,et al.  Graph Classification by Means of Lipschitz Embedding , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Chengqi Zhang,et al.  Cross-View Feature Hashing for Image Retrieval , 2016, PAKDD.

[35]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[36]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[37]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[38]  Ping Li,et al.  b-Bit minwise hashing , 2009, WWW '10.

[39]  Ping Li,et al.  Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search , 2014, ICML.

[40]  Philip S. Yu,et al.  On Clustering Graph Streams , 2010, SDM.

[41]  Chengqi Zhang,et al.  Graph Ensemble Boosting for Imbalanced Noisy Graph Stream Classification , 2015, IEEE Transactions on Cybernetics.

[42]  Jia Wu,et al.  CogBoost: Boosting for Fast Cost-Sensitive Graph Classification , 2015, IEEE Transactions on Knowledge and Data Engineering.

[43]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[44]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[45]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[46]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.