Improving Active Learning by Avoiding Ambiguous Samples

If label information in a classification task is expensive, it can be beneficial to use active learning to get the most informative samples to label by a human. However, there can be samples which are meaningless to the human or recorded wrongly. If these samples are near the classifier’s decision boundary, they are queried repeatedly for labeling. This is inefficient for training because the human can not label these samples correctly and this may lower human acceptance. We introduce an approach to compensate the problem of ambiguous samples by excluding clustered samples from labeling. We compare this approach to other state-of-the-art methods. We further show that we can improve the accuracy in active learning and reduce the number of ambiguous samples queried while training.

[1]  Heiko Wersing,et al.  Optimal local rejection for classifiers , 2016, Neurocomputing.

[2]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[3]  Maria Eugenia Ramirez-Loaiza,et al.  Active learning: an empirical study of common baselines , 2017, Data Mining and Knowledge Discovery.

[4]  Aristidis Likas,et al.  Active Learning with the Probabilistic RBF Classifier , 2006, ICANN.

[5]  Marc Strickert,et al.  High-Throughput Multi-dimensional Scaling (HiT-MDS) for cDNA-Array Expression Data , 2005, ICANN.

[6]  Joachim Denzler,et al.  Active learning and discovery of object categories in the presence of unnameable instances , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[8]  Xingquan Zhu,et al.  I don't know the label: Active learning with blind knowledge , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  Heiko Wersing,et al.  Interactive online learning for obstacle classification on a mobile robot , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[12]  Heiko Wersing,et al.  Efficient accuracy estimation for instance-based incremental active learning , 2018, ESANN.

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Heiko Wersing,et al.  Incremental on-line learning: A review and comparison of state of the art algorithms , 2018, Neurocomputing.