Graph-based semi-supervised learning for relational networks

We address the problem of semi-supervised learning in relational networks, networks in which nodes are entities and links are the relationships or interactions between them. Typically this problem is confounded with the problem of graph-based semi-supervised learning (GSSL), because both problems represent the data as a graph and predict the missing class labels of nodes. However, not all graphs are created equally. In GSSL a graph is constructed, often from independent data, based on similarity. As such, edges tend to connect instances with the same class label. Relational networks, however, can be more heterogeneous and edges do not always indicate similarity. For instance, instead of links being more likely to connect nodes with the same class label, they may occur more frequently between nodes with different class labels (link-heterogeneity). Or nodes with the same class label do not necessarily have the same type of connectivity across the whole network (class-heterogeneity), e.g. in a network of sexual interactions we may observe links between opposite genders in some parts of the graph and links between the same genders in others. Performing classification in networks with different types of heterogeneity is a hard problem that is made harder still when we do not know a-priori the type or level of heterogeneity. Here we present two scalable approaches for graph-based semi-supervised learning for the more general case of relational networks. We demonstrate these approaches on synthetic and real-world networks that display different link patterns within and between classes. Compared to state-of-the-art approaches, ours give better classification performance without prior knowledge of how classes interact. In particular, our two-step label propagation algorithm gives consistently good accuracy and runs on networks of over 1.6 million nodes and 30 million edges in around 12 seconds.

[1]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  K. Reitz,et al.  Graph and Semigroup Homomorphisms on Networks of Relations , 1983 .

[3]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[4]  Rong Jin,et al.  Semi-Supervised Learning by Mixed Label Propagation , 2007, AAAI.

[5]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[6]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[7]  Zheng Wang,et al.  Active learning for node classification in assortative and disassortative networks , 2011, KDD.

[8]  S. Borgatti,et al.  The class of all regular equivalences: Algebraic structure and computation☆ , 1989 .

[9]  H. White,et al.  “Structural Equivalence of Individuals in Social Networks” , 2022, The SAGE Encyclopedia of Research Design.

[10]  Leto Peel,et al.  Topological feature based classification , 2011, 14th International Conference on Information Fusion.

[11]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[12]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[13]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[14]  Stephen J. Wright,et al.  Dissimilarity in Graph-Based Semi-Supervised Classification , 2007, AISTATS.

[15]  William W. Cohen,et al.  Semi-Supervised Classification of Network Data Using Very Few Labels , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[16]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[17]  Danai Koutra,et al.  Linearized and Single-Pass Belief Propagation , 2014, Proc. VLDB Endow..

[18]  Jennifer Neville,et al.  Analyzing the Transferability of Collective Inference Models Across Networks , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[19]  Christos Faloutsos,et al.  CAMLP: Confidence-Aware Modulated Label Propagation , 2016, SDM.

[20]  Lise Getoor,et al.  Active Learning for Networked Data , 2010, ICML.

[21]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[22]  Werner Ulrich,et al.  BODY SIZES OF CONSUMERS AND THEIR RESOURCES , 2005 .

[23]  Hyunjung Shin,et al.  Prediction of Protein Function from Networks , 2006, Semi-Supervised Learning.

[24]  Wolfgang Gatterbauer,et al.  Semi-Supervised Learning with Heterophily , 2014, ArXiv.

[25]  Giorgio Valentini,et al.  COSNet: A Cost Sensitive Neural Network for Semi-supervised Learning in Graphs , 2011, ECML/PKDD.

[26]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[27]  Corinna Cortes,et al.  Communities of interest , 2001, Intell. Data Anal..

[28]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .

[29]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[30]  Santo Fortunato,et al.  Network structure, metadata and the prediction of missing nodes , 2016, ArXiv.

[31]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[32]  Xiao Zhang,et al.  Identification of core-periphery structure in networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[34]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[35]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Christos Faloutsos,et al.  Detecting Fraudulent Personalities in Networks of Online Auctioneers , 2006, PKDD.

[37]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[39]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[40]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[41]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[42]  Leto Peel Supervised Blockmodelling , 2012, ArXiv.

[43]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[44]  Leto Peel,et al.  Active discovery of network roles for predicting the classes of network nodes , 2013, J. Complex Networks.

[45]  Daniel B. Larremore,et al.  Efficiently inferring community structure in bipartite networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.