论文信息 - Grain: Improving Data Efficiency of Graph Neural Networks via Diversified Influence Maximization

Grain: Improving Data Efficiency of Graph Neural Networks via Diversified Influence Maximization

Data selection methods, such as active learning and core-set selection, are useful tools for improving the data efficiency of deep learning models on large-scale datasets. However, recent deep learning models have moved forward from independent and identically distributed data to graph-structured data, such as social networks, ecommerce user-item graphs, and knowledge graphs. This evolution has led to the emergence of Graph Neural Networks (GNNs) that go beyond the models existing data selection methods are designed for. Therefore, we present Grain, an efficient framework that opens up a new perspective through connecting data selection in GNNs with social influence maximization. By exploiting the common patterns of GNNs, Grain introduces a novel feature propagation concept, a diversified influence maximization objective with novel influence and diversity functions, and a greedy algorithm with an approximation guarantee into a unified framework. Empirical studies on public datasets demonstrate that Grain significantly improves both the performance and efficiency of data selection (including active learning and core-set selection) for GNNs. To the best of our knowledge, this is the first attempt to bridge two largely parallel threads of research, data selection, and social influence maximization, in the setting of GNNs, paving new ways for improving data efficiency. PVLDB Reference Format: Wentao Zhang, Zhi Yang, Yexin Wang, Yu Shen, Yang Li, Liang Wang, Bin Cui. Grain: Improving Data Efficiency of Graph Neural Networks via Diversified Influence Maximization. PVLDB, 14(11): 2473 2482, 2021. doi:10.14778/3476249.3476295 PVLDB Availability Tag: The source code of this research paper has been made publicly available at https://github.com/zwt233/Grain.

[1] Xiao-Ming Wu,et al. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[2] Samy Bengio,et al. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[3] Laks V. S. Lakshmanan,et al. Information and Influence Propagation in Social Networks , 2013, Synthesis Lectures on Data Management.

[4] Sariel Har-Peled,et al. Smaller Coresets for k-Median and k-Means Clustering , 2005, SCG.

[5] Xiaokui Xiao,et al. Active Learning for Node Classification: The Additional Learning Ability from Unlabelled Nodes , 2020, ArXiv.

[6] Hui Lin,et al. A Class of Submodular Functions for Document Summarization , 2011, ACL.

[7] Éva Tardos,et al. Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[8] Jeff A. Bilmes,et al. Submodular subset selection for large-scale speech training data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Trevor Campbell,et al. Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[10] Lei Chen,et al. ALG: Fast and Accurate Active Learning Framework for Graph Convolutional Networks , 2021, SIGMOD Conference.

[11] Hong Yang,et al. Active Discriminative Network Representation Learning , 2018, IJCAI.

[12] Wei Chen,et al. Scalable and parallelizable influence maximization with Random Walk Ranking and Rank Merge Pruning , 2017, Inf. Sci..

[13] Yoshua Bengio,et al. An Empirical Study of Example Forgetting during Deep Neural Network Learning , 2018, ICLR.

[14] Silvio Savarese,et al. Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[15] Yuxiao Dong,et al. Microsoft Academic Graph: When experts are not enough , 2020, Quantitative Science Studies.

[16] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17] Stephan Günnemann,et al. Personalized Embedding Propagation: Combining Neural Networks on Graphs with Personalized PageRank , 2018, ArXiv.

[18] Bin Ma,et al. Unsupervised data selection and word-morph mixed language model for tamil low-resource keyword search , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Stephan Günnemann,et al. Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[20] Kilian Q. Weinberger,et al. Simplifying Graph Convolutional Networks , 2019, ICML.

[21] Xin Li,et al. Adaptive Active Learning for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Kaigui Bian,et al. GARG: Anonymous Recommendation of Point-of-Interest in Mobile Networks by Graph Convolution Network , 2020, Data Science and Engineering.

[23] Jiawei Jiang,et al. OpenBox: A Generalized Black-box Optimization Service , 2021, KDD.

[24] Shiwen Wu,et al. Graph Neural Networks in Recommender Systems: A Survey , 2020, ArXiv.

[25] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[26] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.

[27] Andreas Krause,et al. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[28] Joan Bruna,et al. On Graph Neural Networks versus Graph-Augmented MLPs , 2021, ICLR.