Representing Graphs as Bag of Vertices and Partitions for Graph Classification

Graph classification is a difficult task because finding a good feature representation for graphs is challenging. Existing methods use topological metrics or local subgraphs as features, but the time complexity for finding discriminatory subgraphs or computing some of the crucial topological metrics (such as diameter and shortest path) is high, so existing methods do not scale well when the graphs to be classified are large. Another issue of graph classification is that the number of distinct graphs for each class that are available for training a classification model is generally limited. Such scarcity of graph data resources yields models that have much fewer instances than the model parameters, which leads to poor classification performance. In this work, we propose a novel approach for solving graph classification by using two alternative graph representations: the bag of vertices and the bag of partitions. For the first representation, we use representation learning-based node features and for the second, we use traditional metric-based features. Our experiments with 43 real-life graphs from seven different domains show that the bag representation of a graph improves the performance of graph classification significantly. We have shown 4–75% improvement on the vertex-based and 4–36% improvement on partition-based approach over the existing best methods. Besides, our vertex and partition multi-instance methods are on average 75 and 11 times faster in feature construction time than the current best, respectively.

[1]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[2]  Philip S. Yu,et al.  Dual active feature and sample selection for graph classification , 2011, KDD.

[3]  Konrad Rieck,et al.  Structural detection of android malware using embedded call graphs , 2013, AISec.

[4]  Tony Jebara,et al.  Structure preserving embedding , 2009, ICML '09.

[5]  Thomas Gärtner,et al.  Cyclic pattern kernels for predictive graph mining , 2004, KDD.

[6]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[7]  George Karypis,et al.  Frequent Substructure-Based Approaches for Classifying Chemical Compounds , 2005, IEEE Trans. Knowl. Data Eng..

[8]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[9]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[10]  Takashi Washio,et al.  Constructing Decision Trees for Graph-Structured Data by Chunkingless Graph-Based Induction , 2006, PAKDD.

[11]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[12]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[15]  Sebastian Nowozin,et al.  gBoost: a mathematical programming approach to graph classification and regression , 2009, Machine Learning.

[16]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[18]  Lawrence B. Holder,et al.  Graph-Based Relational Concept Learning , 2002, International Conference on Machine Learning.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Jia Wu,et al.  CogBoost: Boosting for Fast Cost-Sensitive Graph Classification , 2015, IEEE Transactions on Knowledge and Data Engineering.

[21]  Feng Liu,et al.  Deep Learning Approaches for Link Prediction in Social Network Services , 2013, ICONIP.

[22]  Hongliang Fei,et al.  Structured Sparse Boosting for Graph Classification , 2014, TKDD.

[23]  F. Heider The psychology of interpersonal relations , 1958 .

[24]  Danai Koutra,et al.  Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms , 2011, ECML/PKDD.

[25]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Wei Wang,et al.  GAIA: graph classification using evolutionary computation , 2010, SIGMOD Conference.

[27]  Mohammad Al Hasan,et al.  Graft: An Efficient Graphlet Counting Method for Large Graph Analysis , 2014, IEEE Transactions on Knowledge and Data Engineering.

[28]  Inderjit S. Dhillon,et al.  A fast kernel-based multilevel algorithm for graph clustering , 2005, KDD '05.

[29]  J. Bajorath,et al.  Structure-activity relationship anatomy by network-like similarity graphs and local structure-activity relationship indices. , 2008, Journal of medicinal chemistry.

[30]  Jian Pei,et al.  Within-Network Classification Using Radius-Constrained Neighborhood Patterns , 2014, CIKM.

[31]  Ambuj K. Singh,et al.  Mining discriminative subgraphs from global-state networks , 2013, KDD.

[32]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[33]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[34]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[35]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[36]  Sasan Yazdani,et al.  A Relative Feature Selection Algorithm for Graph Classification , 2012, ADBIS.

[37]  Jason Weston,et al.  Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[38]  Philip S. Yu,et al.  Near-optimal Supervised Feature Selection among Frequent Subgraphs , 2009, SDM.

[39]  G. C. Homans,et al.  Social Behavior as Exchange , 1958, American Journal of Sociology.

[40]  Huan Liu,et al.  Leveraging social media networks for classification , 2011, Data Mining and Knowledge Discovery.

[41]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[42]  Whitman Richards,et al.  Graph Comparison Using Fine Structure Analysis , 2010, 2010 IEEE Second International Conference on Social Computing.

[43]  J. Coleman Individual interests and collective action : selected essays , 1988 .

[44]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[45]  Geng Li,et al.  Effective graph classification based on topological and label attributes , 2012, Stat. Anal. Data Min..

[46]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[47]  S. V. N. Vishwanathan,et al.  Fast Computation of Graph Kernels , 2006, NIPS.

[48]  Vojtech Rödl,et al.  On embedding of graphs into euclidean spaces of small dimension , 1992, J. Comb. Theory, Ser. B.

[49]  G. Karypis,et al.  Frequent sub-structure-based approaches for classifying chemical compounds , 2005, Third IEEE International Conference on Data Mining.

[50]  Hong Cheng,et al.  Identifying bug signatures using discriminative graph mining , 2009, ISSTA.

[51]  Philip S. Yu,et al.  Discriminative frequent subgraph mining with optimality guarantees , 2010, Stat. Anal. Data Min..

[52]  Danai Koutra,et al.  Network similarity via multiple social theories , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[53]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.