Inductive Representation Learning in Large Attributed Graphs

Graphs (networks) are ubiquitous and allow us to model entities (nodes) and the dependencies (edges) between them. Learning a useful feature representation from graph data lies at the heart and success of many machine learning tasks such as classification, anomaly detection, link prediction, among many others. Many existing techniques use random walks as a basis for learning features or estimating the parameters of a graph model for a downstream prediction task. Examples include recent node embedding methods such as DeepWalk, node2vec, as well as graph-based deep learning algorithms. However, the simple random walk used by these methods is fundamentally tied to the identity of the node. This has three main disadvantages. First, these approaches are inherently transductive and do not generalize to unseen nodes and other graphs. Second, they are not space-efficient as a feature vector is learned for each node which is impractical for large graphs. Third, most of these approaches lack support for attributed graphs. To make these methods more generally applicable, we propose a framework for inductive network representation learning based on the notion of attributed random walk that is not tied to node identity and is instead based on learning a function $\Phi : \mathrm{\rm \bf x} \rightarrow w$ that maps a node attribute vector $\mathrm{\rm \bf x}$ to a type $w$. This framework serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many other previous methods that leverage traditional random walks.

[1]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[2]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[3]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[4]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[5]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[6]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[7]  Jari Saramäki,et al.  Temporal Networks , 2011, Encyclopedia of Social Network Analysis and Mining.

[8]  Ryan A. Rossi,et al.  A Framework for Generalizing Graph-based Representation Learning Methods , 2017, ArXiv.

[9]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[10]  Ryan A. Rossi,et al.  Deep Graph Attention Model , 2017, ArXiv.

[11]  Philip S. Yu,et al.  Meta path-based collective classification in heterogeneous information networks , 2012, CIKM.

[12]  George C. Verghese,et al.  Graph similarity scoring and matching , 2008, Appl. Math. Lett..

[13]  Philip S. Yu,et al.  Mining Knowledge from Interconnected Data: A Heterogeneous Information Network Analysis Approach , 2012, Proc. VLDB Endow..

[14]  Hisashi Kashima,et al.  Cross-Temporal Link Prediction , 2011, 2011 IEEE 11th International Conference on Data Mining.

[15]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[16]  Deep Dynamic Relational Classifiers : Exploiting Dynamic Neighborhoods in Complex Networks , 2016 .

[17]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[18]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[19]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[20]  Ryan A. Rossi,et al.  Role Discovery in Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[21]  Ryan A. Rossi,et al.  Deep Feature Learning for Graphs , 2017, ArXiv.

[22]  David W. Aha,et al.  Transforming Graph Data for Statistical Relational Learning , 2012, J. Artif. Intell. Res..