On the Labeling Correctness in Computer Vision Datasets

Image datasets have heavily been used to build computer vision systems. These datasets are either manually or automatically labeled, which is a problem as both labeling methods are prone to errors. To investigate this problem, we use a majority voting ensemble that combines the results from several Convolutional Neural Networks (CNNs). Majority voting ensembles not only enhance the overall performance, but can also be used to estimate the confidence level of each sample. We also examined Softmax as another form to estimate posterior probability. We have designed various experiments with a range of different ensembles built from one or different, or temporal/snapshot CNNs, which have been trained multiple times stochastically. We analyzed CIFAR10, CIFAR100, EMNIST, and SVHN datasets and we found quite a few incorrect labels, both in the training and testing sets. We also present detailed confidence analysis on these datasets and we found that the ensemble is better than the Softmax when used estimate the per-sample confidence. This work thus proposes an approach that can be used to scrutinize and verify the labeling of computer vision datasets, which can later be applied to weakly/semi-supervised learning. We propose a measure, based on the Odds-Ratio, to quantify how many of these incorrectly classified labels are actually incorrectly labeled and how many of these are confusing. The proposed methods are easily scalable to larger datasets, like ImageNet, LSUN and SUN, as each CNN instance is trained for 60 epochs; or even faster, by implementing a temporal (snapshot) ensemble.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[4]  Rodrigo Minetto,et al.  Hydra: An Ensemble of Convolutional Neural Networks for Geospatial Land Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Mark J. van der Laan,et al.  The relative performance of ensemble methods with deep convolutional neural networks for image classification , 2017, Journal of applied statistics.

[6]  Ross D. King,et al.  Active Learning for Regression Based on Query by Committee , 2007, IDEAL.

[7]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[8]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[9]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[10]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[11]  Zhenchang Xing,et al.  Ensemble application of convolutional and recurrent neural networks for multi-label text categorization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Tony R. Martinez,et al.  Turning Bayesian model averaging into Bayesian model combination , 2011, The 2011 International Joint Conference on Neural Networks.

[14]  Marco Loog,et al.  A variance maximization criterion for active learning , 2017, Pattern Recognit..

[15]  Yuichi Shiraishi,et al.  Statistical approaches to combining binary classifiers for multi-class classification , 2011, Neurocomputing.

[16]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Abdulrahman H. Altalhi,et al.  Statistical comparisons of active learning strategies over multiple datasets , 2018, Knowl. Based Syst..

[18]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[19]  Kenli Li,et al.  An Ensemble CNN2ELM for Age Estimation , 2018, IEEE Transactions on Information Forensics and Security.

[20]  Xin Li,et al.  Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[22]  Ruimao Zhang,et al.  Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  JingLin Chen,et al.  An Ensemble of Convolutional Neural Networks for Image Classification Based on LSTM , 2017, 2017 International Conference on Green Informatics (ICGI).