Towards a Taxonomy of Graph Learning Datasets

Graph neural networks (GNNs) have attracted much attention due to their ability to leverage the intrinsic geometries of the underlying data. Although many different types of GNN models have been developed, with many benchmarking procedures to demonstrate the superiority of one GNN model over the others, there is a lack of systematic understanding of the underlying benchmarking datasets, and what aspects of the model are being tested. Here, we provide a principled approach to taxonomize graph benchmarking datasets by carefully designing a collection of graph perturbations to probe the essential data characteristics that GNN models leverage to perform predictions. Our data-driven taxonomization of graph datasets provides a new understanding of critical dataset characteristics that will enable better model evaluation and the development of more specialized GNN models.

[1]  Rik Sarkar,et al.  Multi-scale Attributed Node Embedding , 2019, J. Complex Networks.

[2]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[3]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[4]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[5]  Rik Sarkar,et al.  Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models , 2020, CIKM.

[6]  Hesham Mostafa,et al.  On Local Aggregation in Heterophilic Graphs , 2021, ArXiv.

[7]  Stephan Günnemann,et al.  Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking , 2017, ICLR.

[8]  Guy Wolf,et al.  Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks , 2020, NeurIPS.

[9]  Joan Bruna,et al.  Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges , 2021, ArXiv.

[10]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[11]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[12]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[13]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[14]  Kevin Chen-Chuan Chang,et al.  Geom-GCN: Geometric Graph Convolutional Networks , 2020, ICLR.

[15]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[16]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[17]  William L. Hamilton Graph Representation Learning , 2020, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[18]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[19]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[20]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[21]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[22]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Kristian Kersting,et al.  TUDataset: A collection of benchmark datasets for learning with graphs , 2020, ArXiv.

[24]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[25]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[26]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.