Unsupervised Object Transfiguration with Attention

Object transfiguration is a subtask of the image-to-image translation, which translates two independent image sets and has a wide range of applications. Recently, some studies based on Generative Adversarial Network (GAN) have achieved impressive results in the image-to-image translation. However, the object transfiguration task only translates regions containing target objects instead of whole images; most of the existing methods never consider this issue, which results in mistranslation on the backgrounds of images. To address this problem, we present a novel pipeline called Deep Attention Unit Generative Adversarial Networks (DAU-GAN). During the translating process, the DAU computes attention masks that point out where the target objects are. DAU makes GAN concentrate on translating target objects while ignoring meaningless backgrounds. Additionally, we construct an attention-consistent loss and a background-consistent loss to compel our model to translate intently target objects and preserve backgrounds further effectively. We have comparison experiments on three popular related datasets, demonstrating that the DAU-GAN achieves superior performance to the state-of-the-art. We also export attention masks in different stages to confirm its effect during the object transfiguration task. The proposed DAU-GAN can translate object effectively as well as preserve backgrounds information at the same time. In our model, DAU learns to focus on the most important information by producing attention masks. These masks compel DAU-GAN to effectively distinguish target objects and backgrounds during the translation process and to achieve impressive translation results in two subsets of ImageNet and CelebA. Moreover, the results show that we cannot only investigate the model from the image itself but also research from other modal information.

[1]  Xuelong Li,et al.  Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement , 2018, Pattern Recognit..

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Gang Hua,et al.  Visual attribute transfer through deep image analogy , 2017, ACM Trans. Graph..

[4]  Erfu Yang,et al.  A Deep Convolutional Generative Adversarial Networks (DCGANs)-Based Semi-Supervised Method for Object Recognition in Synthetic Aperture Radar (SAR) Images , 2018, Remote. Sens..

[5]  Rui Zhang,et al.  Learning Latent Features With Infinite Nonnegative Binary Matrix Trifactorization , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[6]  Jiashi Feng,et al.  Zero-Shot Learning via Attribute Regression and Class Prototype Rectification. , 2018, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[7]  Zheng Wang,et al.  A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos , 2018, Neurocomputing.

[8]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Delu Zeng,et al.  Removing Rain from Single Images via a Deep Detail Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Erfu Yang,et al.  A novel target detection method for SAR images based on shadow proposal and saliency analysis , 2017, Neurocomputing.

[11]  Erfu Yang,et al.  Dual-Branch Deep Convolution Neural Network for Polarimetric SAR Image Classification , 2017 .

[12]  W. Martin Usrey,et al.  Attention enhances synaptic efficacy and the signal-to-noise ratio in neural circuits , 2013 .

[13]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[15]  Peijun Du,et al.  Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging , 2016, Neurocomputing.

[16]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[17]  Erfu Yang,et al.  Visual Attention Model Based Vehicle Target Detection in Synthetic Aperture Radar Images: A Novel Approach , 2015, Cognitive Computation.

[18]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[19]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[20]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Stephen Marshall,et al.  Cognitive Fusion of Thermal and Visible Imagery for Effective Detection and Tracking of Pedestrians in Videos , 2018, Cognitive Computation.

[22]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[23]  Lei Guo,et al.  Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[27]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28]  Stanley S. Ipson,et al.  Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection , 2010 .

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kaizhu Huang,et al.  Learning from Few Samples with Memory Network , 2016, ICONIP.

[32]  Jinchang Ren,et al.  Object-Based 2D-to-3D Video Conversion for Effective Stereoscopic Content Generation in 3D-TV Applications , 2011, IEEE Transactions on Broadcasting.

[33]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Tao Mei,et al.  DA-GAN: Instance-Level Image Translation by Deep Attention Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Shuicheng Yan,et al.  A survey on deep learning-based fine-grained object classification and semantic segmentation , 2017, International Journal of Automation and Computing.

[38]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[39]  Zhidong Deng,et al.  Segmentation of Drivable Road Using Deep Fully Convolutional Residual Network with Pyramid Pooling , 2018, Cognitive Computation.

[40]  Vincent Gripon,et al.  A Biologically Inspired Framework for Visual Information Processing and an Application on Modeling Bottom-Up Visual Attention , 2016, Cognitive Computation.

[41]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[42]  Erfu Yang,et al.  Biologically Inspired Progressive Enhancement Target Detection from Heavy Cluttered SAR Images , 2016, Cognitive Computation.

[43]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[44]  Ping Tan,et al.  DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Kaizhu Huang,et al.  Geometry Preserving Multi-task Metric Learning , 2012, ECML/PKDD.

[46]  Erfu Yang,et al.  Visual Saliency Modeling for River Detection in High-Resolution SAR Imagery , 2018, IEEE Access.

[47]  Francisco Charte,et al.  Tips, guidelines and tools for managing multi-label datasets: the mldr.datasets R package and the Cometa data repository , 2018, Neurocomputing.

[48]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.