论文信息 - Can Adversarial Networks Hallucinate Occluded People With a Plausible Aspect?

Can Adversarial Networks Hallucinate Occluded People With a Plausible Aspect?

When you see a person in a crowd, occluded by other persons, you miss visual information that can be used to recognize, re-identify or simply classify him or her. You can imagine its appearance given your experience, nothing more. Similarly, AI solutions can try to hallucinate missing information with specific deep learning architectures, suitably trained with people with and without occlusions. The goal of this work is to generate a complete image of a person, given an occluded version in input, that should be a) without occlusion b) similar at pixel level to a completely visible people shape c) capable to conserve similar visual attributes (e.g. male/female) of the original one. For the purpose, we propose a new approach by integrating the state-of-the-art of neural network architectures, namely U-nets and GANs, as well as discriminative attribute classification nets, with an architecture specifically designed to de-occlude people shapes. The network is trained to optimize a Loss function which could take into account the aforementioned objectives. As well we propose two datasets for testing our solution: the first one, occluded RAP, created automatically by occluding real shapes of the RAP dataset (which collects also attributes of the people aspect); the second is a large synthetic dataset, AiC, generated in computer graphics with data extracted from the GTA video game, that contains 3D data of occluded objects by construction. Results are impressive and outperform any other previous proposal. This result could be an initial step to many further researches to recognize people and their behavior in an open crowded world.

[1] Wangsheng Yu,et al. Robust occlusion-aware part-based visual tracking with object scale adaptation , 2018, Pattern Recognit..

[2] Jian-Huang Lai,et al. Occluded Person Re-Identification , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Andrea Palazzi,et al. Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World , 2018, ECCV.

[6] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7] Bo Hu,et al. Robust Occlusion Handling in Object Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Bo Zhao,et al. Multi-View Image Generation from a Single-View , 2017, ACM Multimedia.

[9] Xiaoou Tang,et al. Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[10] Kaiqi Huang,et al. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[11] Xu Jia,et al. Towards Automatic Image Editing: Learning to See another You , 2016, BMVC.

[12] Kaiqi Huang,et al. A Richly Annotated Dataset for Pedestrian Attribute Recognition , 2016, ArXiv.

[13] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Hao Li,et al. High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Simone Calderara,et al. Transductive People Tracking in Unconstrained Surveillance , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[16] Dacheng Tao,et al. Perceptual Adversarial Networks for Image-to-Image Transformation , 2017, IEEE Transactions on Image Processing.

[17] Simone Calderara,et al. Generative adversarial models for people attribute recognition in surveillance , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[18] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19] Iasonas Kokkinos,et al. DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] 김준모,et al. Rotating Your Face Using Multi-task Deep Neural Network , 2015 .

[21] Scott E. Reed,et al. Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[22] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[23] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[24] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[25] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[26] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[27] Jiri Matas,et al. DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Minh N. Do,et al. Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Bastian Leibe,et al. Person Attribute Recognition with a Jointly-Trained Holistic CNN Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[31] Andrea Vedaldi,et al. Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Shaogang Gong,et al. Person Re-Identification by Unsupervised Video Matching , 2016, Pattern Recognit..

[33] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[34] Honglak Lee,et al. Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[35] Xiaogang Wang,et al. HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[37] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[38] Peter H. N. de With,et al. Detection and handling of occlusion in an object detection system , 2015, Electronic Imaging.

[39] Kristen Grauman,et al. Inferring Unseen Views of People , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[41] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Xiaogang Wang,et al. Partial Occlusion Handling in Pedestrian Detection With a Deep Model , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[43] Nando de Freitas,et al. Generating Interpretable Images with Controllable Structure , 2017 .

[44] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] Bernt Schiele,et al. Learning What and Where to Draw , 2016, NIPS.

[46] Peter V. Gehler,et al. A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47] Anurag Mittal,et al. Deep Neural Networks with Inexact Matching for Person Re-Identification , 2016, NIPS.

[48] Mehrtash Tafazzoli Harandi,et al. Going deeper into action recognition: A survey , 2016, Image Vis. Comput..

[49] Ran He,et al. Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).