Automated Label Noise Identification for Facial Attribute Recognition

Current state-of-the-art facial attribute recognition techniques use exceedingly deep convolutional neural networks (CNNs), which require large human-annotated datasets that are costly and time-consuming to collect. In most domains, there are several large-scale datasets for researchers to work with. In facial attribute recognition, there is only one large-scale dataset available – CelebA – causing researchers to rely too heavily on this one set of data. While CelebA provides the scale necessary for training deep networks, there are several types of noise present in the dataset. We address the problem of label noise by introducing a novel multi-label verification framework to identify mislabeled samples. Our work is applicable to data collection, cleaning, and multi-label verification. Our method is used to analyze label noise in CelebA and perform extensive experiments with additive noise to show the efficacy of the proposed approach.

[1]  Carlos D. Castillo,et al.  Doing the Best We Can With What We Have: Multi-Label Balancing With Selective Learning for Attribute Prediction , 2018, AAAI.

[2]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[4]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[6]  Victor S. Sheng,et al.  Label noise correction methods , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[7]  Yanchun Zhang,et al.  Support Vector Machine for Outlier Detection in Breast Cancer Survivability Prediction , 2008, APWeb Workshops.