Pose-Based View Synthesis for Vehicles: A Perspective Aware Method

In this paper, we focus on the problem of novel view synthesis for vehicles. Some previous works solve the problem of novel view synthesis in a controlled 3D environment by exploiting additional 3D details (i.e., camera viewpoints and underlying 3D models). However, in real scenarios, the 3D details are difficult to obtain. In this case, we find that introducing vehicle pose to represent the views of vehicles is an alternative paradigm to solve the lack of 3D details. In novel view synthesis, preserving local details is one of the most challenging problems. To address this problem, we propose a perspective-aware generative model (PAGM). We are motivated by the prior that vehicles are made of quadrilateral planes. Preserving these rigid planes during image generation ensures that image details are kept. To this end, a classic image transformation method is leveraged, i.e., perspective transformation. In our GAN-based system, the perspective transformation is applied to the encoder feature maps, and the resulting maps are regarded as new conditions for the decoder. This strategy preserves the quadrilateral planes all the way through the network, thus shuttling the texture details from the input image to the generated image. In the experiments, we show that PAGM can generate high-quality vehicle images with fine details. Quantitatively, our method is superior to several competing approaches employing either GAN or the perspective transformation. Code is available at: https://github.com/ilvkai/view-synthesis-for-vehicles

[1]  Ersin Yumer,et al.  Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[3]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[6]  Andrea Palazzi,et al.  Warp and Learn: Novel Views Generation for Vehicles and Other Objects. , 2020, IEEE transactions on pattern analysis and machine intelligence.

[7]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Carlos Hernandez,et al.  Multi-View Stereo: A Tutorial , 2015, Found. Trends Comput. Graph. Vis..

[10]  Ling-Yu Duan,et al.  Embedding Adversarial Learning for Vehicle Re-Identification , 2019, IEEE Transactions on Image Processing.

[11]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Liang Zheng,et al.  Rethinking Triplet Loss for Domain Adaptation , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[15]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[16]  Scott E. Reed,et al.  Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[17]  Jon Gauthier Conditional generative adversarial nets for convolutional face generation , 2015 .

[18]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[19]  John Flynn,et al.  Stereo magnification , 2018, ACM Trans. Graph..

[20]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[21]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[22]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[23]  Peter V. Gehler,et al.  A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaogang Wang,et al.  Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ling Shao,et al.  Cross-View GAN Based Vehicle Generation for Re-identification , 2017, BMVC.

[28]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[29]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[30]  Nicu Sebe,et al.  Deformable GANs for Pose-Based Human Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Bing He,et al.  Part-Regularized Near-Duplicate Vehicle Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Kalyan Sunkavalli,et al.  Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[35]  Ramesh Govindan,et al.  Augmented Vehicular Reality: Enabling Extended Vision for Future Vehicles , 2017, HotMobile.

[36]  Leonidas J. Guibas,et al.  3D-Assisted Feature Synthesis for Novel Views of an Object , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Mario Fritz,et al.  Novel Views of Objects from a Single Image , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Tao Mei,et al.  A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance , 2016, ECCV.

[39]  Ning Zhang,et al.  Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence , 2018, ECCV.

[40]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[41]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[42]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[43]  Wu Liu,et al.  Large-scale vehicle re-identification in urban surveillance videos , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[44]  Luc Van Gool,et al.  Disentangled Person Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Xiaogang Wang,et al.  Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[47]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[48]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[49]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[51]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Bo Zhao,et al.  Multi-View Image Generation from a Single-View , 2017, ACM Multimedia.

[53]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[54]  Adam Herout,et al.  Vehicle Re-identification for Automatic Video Traffic Surveillance , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[55]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[56]  Leonidas J. Guibas,et al.  3D-Assisted Image Feature Synthesis for Novel Views of an Object , 2014, ArXiv.

[57]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Adam Herout,et al.  BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).