论文信息 - Why Do Attributes Propagate in Graph Convolutional Neural Networks?

Why Do Attributes Propagate in Graph Convolutional Neural Networks?

Many efforts have been paid to enhance Graph Convolutional Network from the perspective of propagation under the philosophy that “Propagation is the essence of the GCNNs”. Unfortunately, its adverse effect is over-smoothing, which makes the performance dramatically drop. To prevent the over-smoothing, many variants are presented. However, the perspective of propagation can’t provide an intuitive and unified interpretation to their effect on prevent over-smoothing. In this paper, we aim at providing a novel explanation to the question of “Why do attributes propagate in GCNNs?”. which not only gives the essence of the oversmoothing, but also illustrates why the GCN extensions, including multi-scale GCN and GCN with initial residual, can improve the performance. To this end, an intuitive Graph Representation Learning (GRL) framework is presented. GRL simply constrains the node representation similar with the original attribute, and encourages the connected nodes possess similar representations (pairwise constraint). Based on the proposed GRL, exiting GCN and its extensions can be proved as different numerical optimization algorithms, such as gradient descent, of our proposed GRL framework. Inspired by the superiority of conjugate gradient descent compared to common gradient descent, a novel Graph Conjugate Convolutional (GCC) network is presented to approximate the solution to GRL with fast convergence. Specifically, GCC adopts the obtained information of the last layer, which can be represented as the difference between the input and output of the last layer, as the input to the next layer. Extensive experiments demonstrate the superior performance of GCC. Introduction Graph Neural Networks (GNNs) (Wu et al. 2020; Xu et al. 2019) have become a hot topic in deep learning for their potentials in modeling irregular data. GNNs have been widely used and achieved state-of-the-art performance in many fields, such as computer vision, natural language processing (Yang et al. 2020), traffic forecasting, chemistry and medical analysis, etc. Existing GNNs fall into two categories, spectral methods (Defferrard, Bresson, and Vandergheynst 2016) and spatial ones (Hamilton, Ying, and Leskovec 2017; Gilmer et al. 2017; Yang et al. 2019b,a; Jin et al. 2019, 2020, 2021). ∗Corresponding author. Copyright c © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Graph Convolutional Network (GCN) (Kipf and Welling 2017), which is a simple, well-behaved and insightful GNN, bridges above two perspectives by proving that the propagation can be motivated from a first-order approximation of spectral graph convolutions. Recently progress also demonstrates the equivalent of spatial and spectral ones (Balcilar et al. 2020). Many efforts have been paid to enhance GCN from the perspective of propagation (Gilmer et al. 2017), such as learnable propagation weights in Graph Attention Network (GAT) (Velickovic et al. 2018), Gated Attention Network (GaAN) (Zhang et al. 2018) and Probabilistic GCN (Yang et al. 2020), structural neighbourhood in Geom-GCN (Pei et al. 2020) and multi-scale (multi-hop) combination in N-GCN (Abu-El-Haija et al. 2019a), MixHop (Abu-El-Haija et al. 2019b), LanczosNet (Liao et al. 2019) and Krylov GCN (Luan et al. 2019). The common philosophy of them is: “Propagation is the essence of the GCNNs”. And, the success of GCNs attributes to the Laplacian smoothing induced by the propagation (Li, Han, and Wu 2018). Unfortunately, the most serious issue of GNNs is the over-smoothing, which makes the performance dramatically drop, caused by the multiple propagations via stacking multiple graph convolution layers. Recently, (Oono and Suzuki 2020) shows the the exponential loss of expressive power of GNNs by generalizing the forward propagation of a GCN as a specific dynamical system. To prevent over-smoothing, two kinds of methods are proposed. On one hand, methods in the first category constrain the propagation. Disentangled GCN (Ma et al. 2019) makes each attribute only be propagated on part of the edges. DropEdge (Rong et al. 2020) randomly removes a certain number of edges from the input graph at each training epoch to reduce the adverse effect of message passing. On the other hand, methods in the second category constrain the propagation result with the original attributes. PageRank-GCN (Klicpera, Bojchevski, and Günnemann 2019) integrates personalized PageRank to GCN to combine the original attribute. JKNet (Xu et al. 2018) employs dense connections for multi-hop message passing, while DeepGCN (Li et al. 2019) and (GCNII) (Chen et al. 2020) incorporates residual layers into GCNs to facilitate the development of deep architectures. However, the perspective of propagation can’t provide an intuitive and unified interpretation to their effect on preventing over-smoothing. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

Xiaochun Cao | Liang Yang | Junhua Gu | Bingxin Niu | Chuan Wang