Selective Labeling via Error Bound Minimization

In many practical machine learning problems, the acquisition of labeled data is often expensive and/or time consuming. This motivates us to study a problem as follows: given a label budget, how to select data points to label such that the learning performance is optimized. We propose a selective labeling method by analyzing the out-of-sample error of Laplacian regularized Least Squares (LapRLS). In particular, we derive a deterministic out-of-sample error bound for LapRLS trained on subsampled data, and propose to select a subset of data points to label by minimizing this upper bound. Since the minimization is a combinational problem, we relax it into continuous domain and solve it by projected gradient descent. Experiments on benchmark datasets show that the proposed method outperforms the state-of-the-art methods.

[1]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[2]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[3]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[4]  Jeff A. Bilmes,et al.  Active Semi-Supervised Learning using Submodular Functions , 2011, UAI.

[5]  Jens Vygen,et al.  The Book Review Column1 , 2020, SIGACT News.

[6]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[7]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[8]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[9]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[10]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[11]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[12]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[13]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[14]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[15]  Kun Zhou,et al.  Laplacian optimal design for image retrieval , 2007, SIGIR.

[16]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[17]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[18]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[19]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[20]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[21]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[22]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[23]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[24]  Anthony C. Atkinson,et al.  Optimum Experimental Designs , 1992 .