Structural Neighborhood Based Classification of Nodes in a Network

Classification of entities based on the underlying network structure is an important problem. Networks encountered in practice are sparse and have many missing and noisy links. Statistical learning techniques have been used in intra-network classification; however, they typically exploit only the local neighborhood, so may not perform well. In this paper, we propose a novel structural neighborhood-based classifier learning using a random walk. For classifying a node, we take a random walk from the node and make a decision based on how nodes in the respective k^th-level neighborhood are labeled. We observe that random walks of short length are helpful in classification. Emphasizing role of longer random walks may cause the underlying Markov chain to converge to a stationary distribution. Considering this, we take a lazy random walk based approach with variable termination probability for each node, based on the node's structural properties including its degree. Our experimental study on real world datasets demonstrates the superiority of the proposed approach over the existing state-of-the-art approaches.

[1]  William W. Cohen,et al.  Semi-Supervised Classification of Network Data Using Very Few Labels , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[2]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[5]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[6]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[7]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[8]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[9]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[10]  Gita Reese Sukthankar,et al.  Multi-label relational neighbor classification using social context features , 2013, KDD.

[11]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[12]  Ravi Kumar,et al.  Influence and correlation in social networks , 2008, KDD.

[13]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[16]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[17]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[18]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[19]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[20]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[21]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[22]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[23]  Nello Cristianini,et al.  Learning Semantic Similarity , 2002, NIPS.

[24]  Huan Liu,et al.  Leveraging social media networks for classification , 2011, Data Mining and Knowledge Discovery.

[25]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[27]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[28]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.