Analysis of a greedy active learning strategy

We abstract out the core search problem of active learning schemes, to better understand the extent to which adaptive labeling can improve sample complexity. We give various upper and lower bounds on the number of labels which need to be queried, and we prove that a popular greedy active learning rule is approximately as good as any other strategy for minimizing this number of labels.

[1]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[2]  Jim Lawrence,et al.  Oriented matroids , 1978, J. Comb. Theory B.

[3]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[4]  B. Sturmfels Oriented Matroids , 1993 .

[5]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[6]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[7]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[8]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[9]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[10]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[11]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[12]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[13]  Dana Angluin Queries revisited , 2004, Theor. Comput. Sci..

[14]  Philip M. Long,et al.  A Theoretical Analysis of Query Selection for Collaborative Filtering , 2001, Machine Learning.

[15]  Günter M. Ziegler,et al.  Oriented Matroids , 2017, Handbook of Discrete and Computational Geometry, 2nd Ed..