ALdataset: a benchmark for pool-based active learning

Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points. Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain. Although many pool-based AL methods have been developed, the lack of a comparative benchmarking and integration of techniques makes it difficult to: 1) determine the current state-of-the-art technique; 2) evaluate the relative benefit of new methods for various properties of the dataset; 3) understand what specific problems merit greater attention; and 4) measure the progress of the field over time. To conduct easier comparative evaluation among AL methods, we present a benchmark task for pool-based active learning, which consists of benchmarking datasets and quantitative metrics that summarize overall performance. We present experiment results for various active learning strategies, both recently proposed and classic highly-cited methods, and draw insights from the results.

[1]  Sethuraman Panchanathan,et al.  Batch mode active sampling based on marginal probability distribution matching , 2012, TKDD.

[2]  Chen Wu,et al.  Multi-Class Active Learning by Integrating Uncertainty and Diversity , 2018, IEEE Access.

[3]  Glencora Borradaile,et al.  Batch Active Learning via Coordinated Matching , 2012, ICML.

[4]  Pascal Fua,et al.  Learning Active Learning from Data , 2017, NIPS.

[5]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[6]  Soheil Mohajer,et al.  Active Learning for Top-K Rank Aggregation from Noisy Comparisons , 2017, ICML.

[7]  Guiguang Ding,et al.  Active Learning with Cross-Class Knowledge Transfer , 2016, AAAI.

[8]  Wei Liu,et al.  Exploring Representativeness and Informativeness for Active Learning , 2019, IEEE Transactions on Cybernetics.

[9]  Yusuf Yaslan,et al.  Sparse coding based classifier ensembles in supervised and active learning scenarios for data classification , 2018, Expert Syst. Appl..

[10]  Min Wang,et al.  Active Learning Through Multi-Standard Optimization , 2019, IEEE Access.

[11]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[12]  Michael R. Lyu,et al.  A semi-supervised active learning framework for image retrieval , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Xin Li,et al.  Adaptive Active Learning for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jaime G. Carbonell,et al.  A theory of transfer learning with applications to active learning , 2013, Machine Learning.

[15]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Andreas Krause,et al.  Near-optimal Batch Mode Active Learning and Adaptive Submodular Optimization , 2013, ICML.

[17]  Purnamrita Sarkar,et al.  Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning , 2014, Proc. VLDB Endow..

[18]  Bernt Schiele,et al.  RALF: A reinforced active learning formulation for object class recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Chun-Liang Li,et al.  Active Learning Using Hint Information , 2015, Neural Computation.

[20]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[21]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[22]  Christian Igel,et al.  Active learning with support vector machines , 2014, WIREs Data Mining Knowl. Discov..

[23]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[24]  Qiang Yang,et al.  Active Transfer Learning for Cross-System Recommendation , 2013, AAAI.

[25]  Lixu Gu,et al.  A novel active learning framework for classification: using weighted rank aggregation to achieve multiple query criteria , 2018, Pattern Recognit..

[26]  Zhipeng Ye,et al.  Practice makes perfect: An adaptive active learning framework for image classification , 2016, Neurocomputing.

[27]  Silvio Savarese,et al.  A Geometric Approach to Active Learning for Convolutional Neural Networks , 2017, ArXiv.

[28]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yue Gao,et al.  Active Learning with Cross-Class Similarity Transfer , 2017, AAAI.

[30]  Ashish Kapoor,et al.  Active Learning with Model Selection , 2014, AAAI.

[31]  Sheng-Jun Huang,et al.  Self-Paced Active Learning: Query the Right Thing at the Right Time , 2019, AAAI.

[32]  Nan Ye,et al.  Robustness of Bayesian Pool-Based Active Learning Against Prior Misspecification , 2016, AAAI.

[33]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[34]  Heng Huang,et al.  New Balanced Active Learning Model and Optimization Algorithm , 2018, IJCAI.

[35]  Martin Müller,et al.  Towards User‐Centered Active Learning Algorithms , 2018, Comput. Graph. Forum.

[36]  Jun Zhou,et al.  Maximizing Expected Model Change for Active Learning in Regression , 2013, 2013 IEEE 13th International Conference on Data Mining.

[37]  Zhi-Hua Zhou,et al.  Cost-Effective Active Learning from Diverse Labelers , 2017, IJCAI.

[38]  Sebastián Ventura,et al.  Effective active learning strategy for multi-label learning , 2018, Neurocomputing.

[39]  Hsuan-Tien Lin,et al.  Active Learning by Learning , 2015, AAAI.