Deep topology classification: A new approach for massive graph classification

The classification of graphs is a key challenge within many scientific fields using graphs to represent data and is an active area of research. Graph classification can be critical in identifying and labelling unknown graphs within a dataset and has seen application across many scientific fields. Graph classification poses two distinct problems: the classification of elements within a graph and the classification of the entire graph. Whilst there is considerable work on the first problem, the efficient and accurate classification of massive graphs into one or more classes has, thus far, received less attention. In this paper we propose the Deep Topology Classification (DTC) approach for global graph classification. DTC extracts both global and vertex level topological features from a graph to create a highly discriminate representation in feature space. A deep feed-forward neural network is designed and trained to classify these graph feature vectors. This approach is shown to be over 99% accurate at discerning graph classes over two datasets. Additionally, it is shown to be more accurate than current state of the art approaches both in binary and multi-class graph classification tasks.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[5]  Philip S. Yu,et al.  Under Consideration for Publication in Knowledge and Information Systems Gmlc: a Multi-label Feature Selection Framework for Graph Classification , 2011 .

[6]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[7]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[8]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[9]  G. Theodoropoulos,et al.  Efficient comparison of massive graphs through the use of 'graph fingerprints'. , 2016, MLG 2016.

[10]  Hayaru Shouno,et al.  Analysis of function of rectified linear unit used in deep learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[11]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[12]  Lawrence B. Holder,et al.  Scalable SVM-Based Classification in Dynamic Graphs , 2014, 2014 IEEE International Conference on Data Mining.

[13]  Gao Daqi,et al.  Classification methodologies of multilayer perceptrons with sigmoid activation functions , 2005, Pattern Recognit..

[14]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[15]  Yorick Wilks,et al.  A Closer Look at Skip-gram Modelling , 2006, LREC.

[16]  Ting Guo,et al.  Understanding the roles of sub-graph features for graph classification: an empirical study perspective , 2013, CIKM.

[17]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[18]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[19]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[20]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[21]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[22]  Vincent Labatut,et al.  Classification of Complex Networks Based on Topological Properties , 2014, 2013 International Conference on Cloud and Green Computing.

[23]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[24]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[25]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[26]  Bülent Yener,et al.  Graph Classification via Topological and Label Attributes , 2011 .

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Phillip Bonacich,et al.  Some unique properties of eigenvector centrality , 2007, Soc. Networks.

[29]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[30]  Danai Koutra,et al.  NetSimile: A Scalable Approach to Size-Independent Network Similarity , 2012, ArXiv.

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  Lawrence B. Holder,et al.  Empirical comparison of graph classification algorithms , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[33]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[36]  Shirui Pan,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Graph Classification with Imbalanced Class Distributions and Noise ∗ , 2022 .

[37]  Charu C. Aggarwal,et al.  Evolutionary Network Analysis , 2014, ACM Comput. Surv..

[38]  Hermann Ney,et al.  Cross-entropy vs. squared error training: a theoretical and experimental comparison , 2013, INTERSPEECH.

[39]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[40]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[41]  Huan Liu,et al.  Leveraging social media networks for classification , 2011, Data Mining and Knowledge Discovery.

[42]  Geng Li,et al.  Effective graph classification based on topological and label attributes , 2012, Stat. Anal. Data Min..

[43]  Philip S. Yu,et al.  Discriminative frequent subgraph mining with optimality guarantees , 2010, Stat. Anal. Data Min..