Sparsification and Sampling of Networks for Collective Classification

Network analysis has been an active area of research for the past few decades. Out of many open research questions that have been extensively studied, relational classification, community detection, link prediction are only to name a few. Collective classification is a well-known relational classification method for classifying entities (nodes) within a network which involves using both node based features and topological features of each node. It involves collective prediction of the unknown labels of all the test nodes in the network using label information of the training nodes. Even though this has been a well researched topic for years, very little has been done to address the following two challenges: (1) how to actively select the labeled nodes from the network to be used for training, and (2) how to efficiently obtain a sparse representation of the original network without losing much information, so that learning can scale to large networks. A lot of work has been done in theoretical computer science which aims towards finding the best approximation of large graphs. However, not much has been done from the perspective of finding an approximate subgraph that will help in classification of network datasets. In this paper, our contribution is in proposing an efficient graph sparsification method and a sampling technique which, along with the state-of-the-art network classifiers, can give comparable runtime and classification accuracies.

[1]  Paul N. Bennett,et al.  Active Sampling of Networks , 2012 .

[2]  Kalyan Moy Gupta,et al.  Cautious Inference in Collective Classification , 2007, AAAI.

[3]  Ramana Rao Kompella,et al.  Time-based sampling of social network activity graphs , 2010, MLG '10.

[4]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[5]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[6]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[7]  Huzefa Rangwala,et al.  Multi-label Collective Classification Using Adaptive Neighborhoods , 2012, 2012 11th International Conference on Machine Learning and Applications.

[8]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[9]  Shang-Hua Teng,et al.  Spectral Sparsification of Graphs , 2008, SIAM J. Comput..

[10]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[11]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[12]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[13]  Ramana Rao Kompella,et al.  Network Sampling Designs for Relational Classification , 2012, ICWSM.

[14]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[15]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[16]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.

[17]  Srinivasan Parthasarathy,et al.  Local graph sparsification for scalable clustering , 2011, SIGMOD '11.

[18]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..