Investigating the Consistency of Uncertainty Sampling in Deep Active Learning

Uncertainty sampling is a widely used active learning strategy to select unlabeled examples for annotation. However, previous work hints at weaknesses of uncertainty sampling when combined with deep learning, where the amount of data is even more significant. To investigate these problems, we analyze the properties of the latent statistical estimators of uncertainty sampling in simple scenarios. We prove that uncertainty sampling converges towards some decision boundary. Additionally, we show that it can be inconsistent, leading to incorrect estimates of the optimal latent boundary. The inconsistency depends on the latent class distribution, more specifically on the class overlap. Further, we empirically analyze the variance of the decision boundary and find that the performance of uncertainty sampling is also connected to the class regions overlap. We argue that our findings could be the first step towards explaining the poor performance of uncertainty sampling combined with deep models.

[1]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[2]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  KEITH CONRAD,et al.  PROBABILITY DISTRIBUTIONS AND MAXIMUM ENTROPY , 2010 .

[5]  J. A. Fridy Introductory Analysis: The Theory of Calculus , 1987 .

[6]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[7]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[8]  Percy Liang,et al.  On the Relationship between Data Efficiency and Error for Uncertainty Sampling , 2018, ICML.

[9]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[10]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[11]  Thomas Brox,et al.  Parting with Illusions about Deep Active Learning , 2019, ArXiv.

[12]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[13]  Steven J. Kifowit,et al.  The Harmonic Series Diverges Again and Again , 2006 .

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Henry F. Inman,et al.  The overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities , 1989 .

[16]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[17]  L. Bottou,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[18]  V. Lieffers,et al.  Upper , 2020, Definitions.

[19]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[20]  Yong Wang,et al.  Online active learning of decision trees with evidential data , 2016, Pattern Recognit..

[21]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[22]  Joachim Denzler,et al.  Selecting Influential Examples: Active Learning with Expected Model Output Changes , 2014, ECCV.

[23]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[25]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[27]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[28]  Maria-Florina Balcan,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[29]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[30]  Samet Oymak,et al.  On the Marginal Benefit of Active Learning: Does Self-Supervision Eat its Cake? , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).