Learning from Labeled and Unlabeled Vertices in Networks

Networks such as social networks, citation networks, protein-protein interaction networks, etc., are prevalent in real world. However, only very few vertices have labels compared to large amounts of unlabeled vertices. For example, in social networks, not every user provides his/her profile information such as the personal interests which are relevant for targeted advertising. Can we leverage the limited user information and friendship network wisely to infer the labels of unlabeled users? In this paper, we propose a semi-supervised learning framework called weighted-vote Geometric Neighbor classifier (wvGN) to infer the likely labels of unlabeled vertices in sparsely labeled networks. wvGN exploits random walks to explore not only local but also global neighborhood information of a vertex. Then the label of the vertex is determined by the accumulated local and global neighborhood information. Specifically, wvGN optimizes a proposed objective function by a search strategy which is based on the gradient and coordinate descent methods. The search strategy iteratively conducts a coarse search and a fine search to escape from local optima. Extensive experiments on various synthetic and real-world data verify the effectiveness of wvGN compared to state-of-the-art approaches.

[1]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[2]  M. Narasimha Murty,et al.  Structural Neighborhood Based Classification of Nodes in a Network , 2016, KDD.

[3]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[4]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  William W. Cohen,et al.  Semi-Supervised Classification of Network Data Using Very Few Labels , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[6]  Fan Chung,et al.  The heat kernel as the pagerank of a graph , 2007, Proceedings of the National Academy of Sciences.

[7]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[8]  William W. Cohen,et al.  A Very Fast Method for Clustering Big Text Datasets , 2010, ECAI.

[9]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[10]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[12]  David F. Gleich,et al.  Heat kernel based community detection , 2014, KDD.

[13]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[14]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[15]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[16]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[17]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[18]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[19]  Konstantin Avrachenkov,et al.  Cooperative Game Theory Approaches for Network Partitioning , 2017, COCOON.

[20]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[21]  Olvi L. Mangasarian,et al.  A finite newton method for classification , 2002, Optim. Methods Softw..

[22]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[23]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[24]  Gita Reese Sukthankar,et al.  Multi-label relational neighbor classification using social context features , 2013, KDD.

[25]  Jian Pei,et al.  Within-Network Classification Using Radius-Constrained Neighborhood Patterns , 2014, CIKM.

[26]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[27]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[28]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[29]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[30]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[31]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.