Generative Adversarial Networks for Unsupervised Object Co-localization

This paper introduces a novel approach for unsupervised object co-localization using Generative Adversarial Networks (GANs). GAN is a powerful tool that can implicitly learn unknown data distributions in an unsupervised manner. From the observation that GAN discriminator is highly influenced by pixels where objects appear, we analyze the internal layers of discriminator and visualize the activated pixels. Our important finding is that high image diversity of GAN, which is a main goal in GAN research, is ironically disadvantageous for object localization, because such discriminators focus not only on the target object, but also on the various objects, such as background objects. Based on extensive evaluations and experimental studies, we show the image diversity and localization performance have a negative correlation. In addition, our approach achieves meaningful accuracy for unsupervised object co-localization using publicly available benchmark datasets, even comparable to state-of-the-art weakly-supervised approach.

[1]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[2]  T. Tuytelaars,et al.  Weakly Supervised Object Detection with Posterior Regularization , 2014 .

[3]  Ming-Hsuan Yang,et al.  Weakly Supervised Object Localization with Progressive Domain Adaptation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[6]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[7]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[13]  Yao Li,et al.  Deep Descriptor Transforming for Image Co-Localization , 2017, IJCAI.

[14]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[15]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Joo Hyun Park,et al.  Improved Techniques For Weakly-Supervised Object Localization , 2018, 1802.07888.

[19]  Andrew M. Dai,et al.  Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step , 2017, ICLR.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Dahun Kim,et al.  Two-Phase Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[27]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[28]  Zhuowen Tu,et al.  Unsupervised object class discovery via saliency-guided multiple class learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Antonio Torralba,et al.  Unsupervised Detection of Regions of Interest Using Iterative Link Analysis , 2009, NIPS.

[30]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yong Jae Lee,et al.  Weakly-supervised Discovery of Visual Pattern Configurations , 2014, NIPS.

[33]  Cordelia Schmid,et al.  Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[38]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[40]  Jacob Abernethy,et al.  On Convergence and Stability of GANs , 2018 .

[41]  Fei-Fei Li,et al.  Co-localization in Real-World Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[43]  Zaïd Harchaoui,et al.  On learning to localize objects with minimal supervision , 2014, ICML.

[44]  Ivan Laptev,et al.  ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.