Many real-world datasets can be represented in the form of a graph whose edge weights designate similarities between instances. A discrete Gaussian random field (GRF) model is a finite-dimensional Gaussian process (GP) whose prior covariance is the inverse of a graph Laplacian. Minimizing the trace of the predictive covariance Sigma (V-optimality) on GRFs has proven successful in batch active learning classification problems with budget constraints. However, its worst-case bound has been missing. We show that the V-optimality on GRFs as a function of the batch query set is submodular and hence its greedy selection algorithm guarantees an (1-1/e) approximation ratio. Moreover, GRF models have the absence-of-suppressor (AofS) condition. For active survey problems, we propose a similar survey criterion which minimizes 1'(Sigma)1. In practice, V-optimality criterion performs better than GPs with mutual information gain criteria and allows nonuniform costs for different nodes.
[1]
Andreas Krause,et al.
Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies
,
2008,
J. Mach. Learn. Res..
[2]
Abhimanyu Das,et al.
Algorithms for subset selection in linear regression
,
2008,
STOC.
[3]
Roman Garnett,et al.
Bayesian Optimal Active Search and Surveying
,
2012,
ICML.
[4]
Burr Settles,et al.
Active Learning Literature Survey
,
2009
.
[5]
Jiawei Han,et al.
A Variance Minimization Criterion to Active Learning on Graphs
,
2012,
AISTATS.
[6]
J. Lafferty,et al.
Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions
,
2003,
ICML 2003.
[7]
Matthew J. Streeter,et al.
An Online Algorithm for Maximizing Submodular Functions
,
2008,
NIPS.
[8]
Abhimanyu Das,et al.
Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection
,
2011,
ICML.