Online Spectral Learning on a Graph with Bandit Feedback

Online learning on a graph is appealing due to its efficiency. However, existing online learning algorithms on a graph are limited to binary classification. Moreover, they require accessing the full label information, where the label oracle needs to return the true class label after the learner makes classification of each node. In many application scenarios, we only have access to partial label information, where the label oracle will return a single bit indicating whether the prediction is correct or not, instead of the true class label. This is also known as bandit feedback. In this paper, to overcome the above limitations of existing online learning algorithms on a graph, we study online learning on a graph for multi-class node classification, in both the full information setting and the partial information setting. First, we present an online multi-class classification algorithm in the full information setting. It is based on function learning on a graph using the spectral information of the graph Laplacian. We show that it attains O (cd log T) regret bound, where T is the number of rounds in online learning, c is the number of classes, and d is the number of eigenvectors of the graph Laplacian used for learning. Second, we present an online multi-class classification algorithm with bandit feedback. We use upper-confidence bound technique to trade off the exploration and exploitation of label information. We show that it attains O (cd √T log T) regret bound, which is only a √T factor worse than the proposed algorithm in the full information setting. Experiments on several benchmark graph datasets show that the proposed online multi-class classification algorithm beats the state-of-art baseline, and the proposed bandit algorithm is also much better than the bandit version of the baseline.

[1]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[2]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[3]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[4]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[5]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[7]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[8]  Koby Crammer,et al.  Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.

[9]  Rong Jin,et al.  A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound , 2012, ICML.

[10]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[11]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[12]  Charu C. Aggarwal,et al.  Selective sampling on graphs for classification , 2013, KDD.

[13]  Mark Herbster,et al.  Online learning over graphs , 2005, ICML.

[14]  Guy Lever,et al.  Predicting the Labelling of a Graph via Minimum $p$-Seminorm Interpolation , 2009, COLT.

[15]  Rong Jin,et al.  A Potential-based Framework for Online Multi-class Learning with Partial Feedback , 2010, AISTATS.

[16]  Guy Lever,et al.  Online Prediction on Large Diameter Graphs , 2008, NIPS.

[17]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[18]  Mark Herbster,et al.  Prediction on a Graph with a Perceptron , 2006, NIPS.

[19]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[20]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[21]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.