Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

Pool-based Active Learning (AL) has achieved great success in minimizing labeling cost by sequentially selecting informative unlabeled samples from a large unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling strategies might not work well in out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains some data samples that do not belong to the classes of the target task. Achieving good AL performance under OOD data scenarios is a challenging task due to the natural conflict between AL sampling strategies and OOD sample detection. AL selects data that are hard to be classified by the current basic classifier (e.g., samples whose predicted class probabilities have high entropy), while OOD samples tend to have more uniform predicted class probabilities (i.e., high entropy) than in-distribution (ID) data. In this paper, we propose a sampling scheme, Monte-Carlo Pareto Optimization for Active Learning (POAL), which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. We cast the AL sampling task as a multi-objective optimization problem, and thus we utilize Pareto optimization based on two conflicting objectives: (1) the normal AL data sampling scheme (e.g., maximum entropy), and (2) the confidence of not being an OOD sample. Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.

[1]  Antoni B. Chan,et al.  A Comparative Survey of Deep Active Learning , 2022, ArXiv.

[2]  Kuan-Hao Huang,et al.  DeepAL: Deep Active Learning in Python , 2021, ArXiv.

[3]  Ziwei Liu,et al.  Generalized Out-of-Distribution Detection: A Survey , 2021, International Journal of Computer Vision.

[4]  A. Vedaldi,et al.  Open-Set Recognition: A Good Closed-Set Classifier is All You Need , 2021, ICLR.

[5]  Suyun Zhao,et al.  Contrastive Coding for Active Learning under Class Distribution Mismatch , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Antoni B. Chan,et al.  A Comparative Survey: Benchmarking for Pool-based Active Learning , 2021, IJCAI.

[7]  Li Fei-Fei,et al.  Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering , 2021, ACL.

[8]  Antoni B. Chan,et al.  Multiple-criteria Based Active Learning with Fixed-size Determinantal Point Processes , 2021, ArXiv.

[9]  Suraj Kothawade,et al.  SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios , 2021, NeurIPS.

[10]  Lue Tao,et al.  Open-set Label Noise Can Improve Robustness Against Inherent Label Noise , 2021, NeurIPS.

[11]  S. Fidler,et al.  Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach , 2021, ICLR.

[12]  Nicolas Vayatis,et al.  Discrepancy-Based Active Learning for Domain Adaptation , 2021, ICLR.

[13]  Philip H. S. Torr,et al.  Deep Deterministic Uncertainty: A New Simple Baseline , 2021, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Y. Gal,et al.  On Statistical Bias In Active Learning: How and When To Fix It , 2021, ICLR.

[15]  Antoni B. Chan,et al.  Accelerating Monte Carlo Bayesian Prediction via Approximating Predictive Uncertainty Over the Simplex , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Eyal Shnarch,et al.  Active Learning for BERT: An Empirical Study , 2020, EMNLP.

[17]  Zhihui Li,et al.  A Survey of Deep Active Learning , 2020, ACM Comput. Surv..

[18]  Yisong Yue,et al.  Active Learning under Label Shift , 2020, AISTATS.

[19]  Rishabh K. Iyer,et al.  Submodular Combinatorial Information Measures with Applications in Machine Learning , 2020, ALT.

[20]  Johan Jonasson,et al.  Optimal sampling in unbiased active learning , 2020, AISTATS.

[21]  Michele Fenzi,et al.  Scalable Active Learning for Object Detection , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[22]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[23]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[24]  Changjian Shui,et al.  Deep Active Learning: Unified and Principled Method for Query and Training , 2019, AISTATS.

[25]  Ahti Salo,et al.  Decision programming for mixed-integer multi-stage optimization under uncertainty , 2019, Eur. J. Oper. Res..

[26]  Nima Anari,et al.  Batch Active Learning Using Determinantal Point Processes , 2019, ArXiv.

[27]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[28]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[29]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sheng-Jun Huang,et al.  ALiPy: Active Learning in Python , 2019, ArXiv.

[31]  Lixu Gu,et al.  A novel active learning framework for classification: using weighted rank aggregation to achieve multiple query criteria , 2018, Pattern Recognit..

[32]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[33]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Qinghua Zheng,et al.  Deep Similarity-Based Batch Mode Active Learning with Exploration-Exploitation , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[35]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[36]  Yang Yu,et al.  Optimizing Ratio of Monotone Set Functions , 2017, IJCAI.

[37]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[38]  Pascal Fua,et al.  Learning Active Learning from Data , 2017, NIPS.

[39]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[40]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[41]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[42]  Qingfu Zhang,et al.  Entropy-Based Termination Criterion for Multiobjective Evolutionary Algorithms , 2016, IEEE Transactions on Evolutionary Computation.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yang Yu,et al.  Subset Selection by Pareto Optimization , 2015, NIPS.

[45]  Charu C. Aggarwal,et al.  Theoretical Foundations and Algorithms for Outlier Ensembles , 2015, SKDD.

[46]  Dan Wang,et al.  A new active labeling method for deep learning , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[47]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[48]  Bernt Schiele,et al.  RALF: A reinforced active learning formulation for object class recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[50]  Alexander G. Gray,et al.  UPAL: Unbiased Pool Based Active Learning , 2011, AISTATS.

[51]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[52]  Qingfu Zhang,et al.  Multiobjective evolutionary algorithms: A survey of the state of the art , 2011, Swarm Evol. Comput..

[53]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[54]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[55]  Jonathan E. Fieldsend,et al.  A Bayesian framework for active learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[56]  Steffen Bickel,et al.  Active Risk Estimation , 2010, ICML.

[57]  R. Marler,et al.  The weighted sum method for multi-objective optimization: new insights , 2010 .

[58]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[59]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[60]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[61]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[62]  Harvey J. Greenberg,et al.  A Multiple-Objective Analysis of Sensor Placement Optimization in Water Networks , 2004 .

[63]  Nikolaos V. Sahinidis,et al.  A finite branch-and-bound algorithm for two-stage stochastic integer programs , 2004, Math. Program..

[64]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[65]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[66]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[67]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[68]  David J. Slate,et al.  Letter Recognition Using Holland-Style Adaptive Classifiers , 1991, Machine Learning.

[69]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[70]  Antoni B. Chan,et al.  Asymptotic optimality for active learning processes , 2022, UAI.

[71]  Chao Qian,et al.  HydraText: Multi-objective Optimization for Adversarial Textual Attack , 2021, ArXiv.

[72]  Suraj Kothawade,et al.  PRISM: A Unified Framework of Parameterized Submodular Information Measures for Targeted Data Subset Selection and Summarization , 2021, ArXiv.

[73]  S. Nickel,et al.  Modeling Multi-stage Decision Making under Incomplete and Uncertain Information , 2020 .

[74]  Dominique Estival,et al.  Active learning for deep semantic parsing , 2018, ACL.

[75]  Vilas M. Thakare,et al.  Computing the Most Significant Solution from Pareto Front obtained in Multi-objective Evolutionary , 2010 .

[76]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[77]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[78]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[79]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[80]  Michael C. Georgiadis,et al.  The integration of process design and control , 2004 .

[81]  David J. Slate,et al.  Letter recognition using Holland-style adaptive classifiers , 2004, Machine Learning.

[82]  Maarten H. van der Vlerk,et al.  Stochastic integer programming:General models and algorithms , 1999, Ann. Oper. Res..