Active semi-supervised learning using sampling theory for graph signals

We consider the problem of offline, pool-based active semi-supervised learning on graphs. This problem is important when the labeled data is scarce and expensive whereas unlabeled data is easily available. The data points are represented by the vertices of an undirected graph with the similarity between them captured by the edge weights. Given a target number of nodes to label, the goal is to choose those nodes that are most informative and then predict the unknown labels. We propose a novel framework for this problem based on our recent results on sampling theory for graph signals. A graph signal is a real-valued function defined on each node of the graph. A notion of frequency for such signals can be defined using the spectrum of the graph Laplacian matrix. The sampling theory for graph signals aims to extend the traditional Nyquist-Shannon sampling theory by allowing us to identify the class of graph signals that can be reconstructed from their values on a subset of vertices. This approach allows us to define a criterion for active learning based on sampling set selection which aims at maximizing the frequency of the signals that can be reconstructed from their samples on the set. Experiments show the effectiveness of our method.

[1]  Pascal Frossard,et al.  Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Data Domains , 2012, ArXiv.

[2]  Chris H. Q. Ding,et al.  Selective Labeling via Error Bound Minimization , 2012, NIPS.

[3]  Braxton Osting,et al.  Minimal Dirichlet Energy Partitions for Graphs , 2013, SIAM J. Sci. Comput..

[4]  Jiawei Han,et al.  Towards Active Learning on Graphs: An Error Bound Minimization Approach , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[6]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[7]  Sunil K. Narang,et al.  Localized iterative methods for interpolation in graph structured data , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[8]  Antonio Ortega,et al.  Towards a sampling theorem for signals on arbitrary graphs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jan P. Allebach,et al.  Iterative reconstruction of bandlimited images from nonuniformly spaced samples , 1987 .

[10]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[11]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[12]  Jeff A. Bilmes,et al.  Label Selection on Graphs , 2009, NIPS.

[13]  I. Pesenson Sampling in paley-wiener spaces on combinatorial graphs , 2008, 1111.5896.

[14]  Sunil K. Narang,et al.  Perfect Reconstruction Two-Channel Wavelet Filter Banks for Graph Structured Data , 2011, IEEE Transactions on Signal Processing.

[15]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  Chun Chen,et al.  Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[19]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[20]  Jeff A. Bilmes,et al.  Active Semi-Supervised Learning using Submodular Functions , 2011, UAI.

[21]  Andrew V. Knyazev,et al.  Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method , 2001, SIAM J. Sci. Comput..

[22]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[23]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[24]  Feiping Nie,et al.  An Iterative Locally Linear Embedding Algorithm , 2012, ICML.

[25]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[26]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[27]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[28]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[29]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[30]  Pierre Vandergheynst,et al.  Wavelets on Graphs via Spectral Graph Theory , 2009, ArXiv.