Convolutional Neural Network-Based Discriminator for Outlier Detection

Noise in training data increases the tendency of many machine learning methods to overfit the training data, which undermines the performance. Outliers occur in big data as a result of various factors, including human errors. In this work, we present a novel discriminator model for the identification of outliers in the training data. We propose a systematic approach for creating training datasets to train the discriminator based on a small number of genuine instances (trusted data). The noise discriminator is a convolutional neural network (CNN). We evaluate the discriminator's performance using several benchmark datasets and with different noise ratios. We inserted random noise in each dataset and trained discriminators to clean them. Different discriminators were trained using different numbers of genuine instances with and without data augmentation. We compare the performance of the proposed noise-discriminator method with seven other methods proposed in the literature using several benchmark datasets. Our empirical results indicate that the proposed method is very competitive to the other methods. It actually outperforms them for pair noise.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Xiuping Jia,et al.  Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[4]  Tarik A. Rashid,et al.  A Study of The Convolutional Neural Networks Applications , 2019 .

[5]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[8]  Rob Fergus,et al.  Learning from Noisy Labels with Deep Neural Networks , 2014, ICLR.

[9]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[10]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[11]  Tarik A. Rashid,et al.  Convolutional Neural Networks based Method for Improving Facial Expression Recognition , 2016 .

[12]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[13]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bo An,et al.  Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[16]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[18]  Mousa Al-Akhras,et al.  Smoothing decision boundaries to avoid overfitting in neural network training , 2011 .

[19]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[20]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[21]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[22]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[23]  Tony R. Martinez,et al.  A noise filtering method using neural networks , 2003, IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. SCIMA 2003..

[24]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[29]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Ilkay Ulusoy,et al.  Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey , 2019, ArXiv.

[31]  Simon K. Warfield,et al.  Deep learning with noisy labels: exploring techniques and remedies in medical image analysis , 2020, Medical Image Anal..

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Saad Al-Ahmadi,et al.  Error-Based Noise Filtering During Neural Network Training , 2020, IEEE Access.

[34]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.