Improving the Performance of Fine-Grain Image Classifiers via Generative Data Augmentation

Recent advances in machine learning (ML) and computer vision tools have enabled applications in a wide variety of arenas such as financial analytics, medical diagnostics, and even within the Department of Defense. However, their widespread implementation in real-world use cases poses several challenges: (1) many applications are highly specialized, and hence operate in a \emph{sparse data} domain; (2) ML tools are sensitive to their training sets and typically require cumbersome, labor-intensive data collection and data labelling processes; and (3) ML tools can be extremely "black box," offering users little to no insight into the decision-making process or how new data might affect prediction performance. To address these challenges, we have designed and developed Data Augmentation from Proficient Pre-Training of Robust Generative Adversarial Networks (DAPPER GAN), an ML analytics support tool that automatically generates novel views of training images in order to improve downstream classifier performance. DAPPER GAN leverages high-fidelity embeddings generated by a StyleGAN2 model (trained on the LSUN cars dataset) to create novel imagery for previously unseen classes. We experimentally evaluate this technique on the Stanford Cars dataset, demonstrating improved vehicle make and model classification accuracy and reduced requirements for real data using our GAN based data augmentation framework. The method's validity was supported through an analysis of classifier performance on both augmented and non-augmented datasets, achieving comparable or better accuracy with up to 30\% less real data across visually similar classes. To support this method, we developed a novel augmentation method that can manipulate semantically meaningful dimensions (e.g., orientation) of the target object in the embedding space.

[1]  Artem Babenko,et al.  Unsupervised Discovery of Interpretable Directions in the GAN Latent Space , 2020, ICML.

[2]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Pavel Zemcík,et al.  Real-Time Pose Estimation Piggybacked on Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[6]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[7]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[8]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[9]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[11]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[12]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[13]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[14]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).