ADA: Adversarial Data Augmentation for Object Detection

The use of random perturbations of ground truth data, such as random translation or scaling of bounding boxes, is a common heuristic used for data augmentation that has been shown to prevent overfitting and improve generalization. Since the design of data augmentation is largely guided by reported best practices, it is difficult to understand if those design choices are optimal. To provide a more principled perspective, we develop a game-theoretic interpretation of data augmentation in the context of object detection. We aim to find an optimal adversarial perturbations of the ground truth data (i.e., the worst case perturbations) that forces the object bounding box predictor to learn from the hardest distribution of perturbed examples for better test-time performance. We establish that the game-theoretic solution (Nash equilibrium) provides both an optimal predictor and optimal data augmentation distribution. We show that our adversarial method of training a predictor can significantly improve test-time performance for the task of object detection. On the ImageNet, Pascal VOC and MS-COCO object detection tasks, our adversarial approach improves performance by about 16%, 5%, and 2% respectively compared to the best performing data augmentation methods.

[1]  Xinhua Zhang,et al.  Efficient and Consistent Adversarial Bipartite Matching , 2018, ICML.

[2]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[3]  Brian D. Ziebart,et al.  Adversarial Cost-Sensitive Classification , 2015, UAI.

[4]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Hao Su,et al.  Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[7]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[8]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Wei Xing,et al.  ARC: Adversarial Robust Cuts for Semi-Supervised and Multi-label Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Cordelia Schmid,et al.  Transformation Pursuit for Image Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  SingerYoram,et al.  On the algorithmic implementation of multiclass kernel-based vector machines , 2002 .

[15]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[18]  Geoff S. Nitschke,et al.  Improving Deep Learning with Generic Data Augmentation , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[19]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[20]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[22]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[23]  Haym Hirsh,et al.  Interactive Consensus Agreement Games for Labeling Images , 2016, HCOMP.

[24]  Stefanos Zafeiriou,et al.  A Semi-automatic Methodology for Facial Landmark Annotation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[26]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[28]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30]  Rob Fergus,et al.  Learning from Noisy Labels with Deep Neural Networks , 2014, ICLR.

[31]  Simon Lucey,et al.  Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.

[32]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[33]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[34]  Seong Joon Oh,et al.  Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Yufeng Liu,et al.  Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[36]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Hong Wang,et al.  Adversarial Prediction Games for Multivariate Losses , 2015, NIPS.

[41]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[43]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[44]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.