Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision

Active learning (AL) aims to minimize labeling efforts for data-demanding deep neural networks (DNNs) by selecting the most representative data points for annotation. However, currently used methods are ill-equipped to deal with biased data. The main motivation of this paper is to consider a realistic setting for pool-based semi-supervised AL, where the unlabeled collection of train data is biased. We theoretically derive an optimal acquisition function for AL in this setting. It can be formulated as distribution shift minimization between unlabeled train data and weakly-labeled validation dataset. To implement such acquisition function, we propose a low-complexity method for feature density matching using self-supervised Fisher kernel (FK) as well as several novel pseudo-label estimators. Our FK-based method outperforms state-of-the-art methods on MNIST, SVHN, and ImageNet classification while requiring only 1/10th of processing. The conducted experiments show at least 40% drop in labeling efforts for the biased class-imbalanced data compared to existing methods.

[1]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[2]  Jim Kay,et al.  Feature discovery under contextual supervision using mutual information , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[4]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[5]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[6]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[7]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[8]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[9]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[10]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[11]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[13]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[15]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Miroslaw Bober,et al.  REMAP: Multi-Layer Entropy-Guided Pooling of Dense CNN Features for Image Retrieval , 2019, IEEE Transactions on Image Processing.

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[19]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[20]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[21]  Qi Tian,et al.  Recent Advance in Content-based Image Retrieval: A Literature Survey , 2017, ArXiv.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[25]  Oluwasanmi Koyejo,et al.  Interpreting Black Box Predictions using Fisher Kernels , 2018, AISTATS.

[26]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[27]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[29]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.