Selective sampling on graphs for classification

Selective sampling is an active variant of online learning in which the learner is allowed to adaptively query the label of an observed example. The goal of selective sampling is to achieve a good trade-off between prediction performance and the number of queried labels. Existing selective sampling algorithms are designed for vector-based data. In this paper, motivated by the ubiquity of graph representations in real-world applications, we propose to study selective sampling on graphs. We first present an online version of the well-known Learning with Local and Global Consistency method (OLLGC). It is essentially a second-order online learning algorithm, and can be seen as an online ridge regression in the Hilbert space of functions defined on graphs. We prove its regret bound in terms of the structural property (cut size) of a graph. Based on OLLGC, we present a selective sampling algorithm, namely Selective Sampling with Local and Global Consistency (SSLGC), which queries the label of each node based on the confidence of the linear function on graphs. Its bound on the label complexity is also derived. We analyze the low-rank approximation of graph kernels, which enables the online algorithms scale to large graphs. Experiments on benchmark graph datasets show that OLLGC outperforms the state-of-the-art first-order algorithm significantly, and SSLGC achieves comparable or even better results than OLLGC while querying substantially fewer nodes. Moreover, SSLGC is overwhelmingly better than random sampling.

[1]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[2]  Jiawei Han,et al.  A Variance Minimization Criterion to Active Learning on Graphs , 2012, AISTATS.

[3]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[4]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[5]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[6]  Claudio Gentile,et al.  Active Learning on Trees and Graphs , 2010, COLT.

[7]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[8]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[9]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[10]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Guy Lever,et al.  Online Prediction on Large Diameter Graphs , 2008, NIPS.

[13]  Claudio Gentile,et al.  Worst-Case Analysis of Selective Sampling for Linear Classification , 2006, J. Mach. Learn. Res..

[14]  Lise Getoor,et al.  Active Learning for Networked Data , 2010, ICML.

[15]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[16]  Mark Herbster,et al.  Online learning over graphs , 2005, ICML.

[17]  Guy Lever,et al.  Predicting the Labelling of a Graph via Minimum $p$-Seminorm Interpolation , 2009, COLT.

[18]  Claudio Gentile,et al.  Robust bounds for classification via selective sampling , 2009, ICML '09.

[19]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[20]  Jiawei Han,et al.  Towards Active Learning on Graphs: An Error Bound Minimization Approach , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  Francesco Orabona,et al.  Better Algorithms for Selective Sampling , 2011, ICML.

[22]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[23]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[24]  Mark Herbster,et al.  Prediction on a Graph with a Perceptron , 2006, NIPS.

[25]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[26]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[27]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[28]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[29]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[30]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[31]  Lise Getoor,et al.  Effective label acquisition for collective classification , 2008, KDD.