Physical Context and Timing Aware Sequence Generating GANs

Generative Adversarial Networks (GANs) have shown remarkable successes in generating realistic images and interpolating changes between images. Existing models, however, do not take into account physical contexts behind images in generating the images, which may cause unrealistic changes. Furthermore, it is difficult to generate the changes at a specific timing and they often do not match with actual changes. This paper proposes a novel GAN, named Physical Context and Timing aware sequence generating GANs (PCTGAN), that generates an image in a sequence at a specific timing between two images with considering physical contexts behind them. Our method consists of three components: an encoder, a generator, and a discriminator. The encoder estimates latent vectors from the beginning and ending images, their timings, and a target timing. The generator generates images and the physical contexts at the beginning, ending, and target timing from the corresponding latent vectors. The discriminator discriminates whether the generated images and contexts are real or not. In the experiments, PCTGAN is applied to a data set of sequential changes of shapes in die forging processes. We show that both timing and physical contexts are effective in generating sequential images.

[1]  Karthik Kashinath,et al.  Enforcing statistical constraints in generative adversarial networks for modeling chaotic dynamical systems , 2019, J. Comput. Phys..

[2]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[3]  S. I. Oh,et al.  Capabilities and applications of FEM code deform: the perspective of the developer , 1991 .

[4]  Edgar A. Bernal,et al.  Generative Adversarial Networks for Depth Map Estimation from RGB Video , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Sjoerd van Steenkiste,et al.  Towards Accurate Generative Models of Video: A New Metric & Challenges , 2018, ArXiv.

[6]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bohyung Han,et al.  Channel Attention Is All You Need for Video Frame Interpolation , 2020, AAAI.

[8]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Houqiang Li,et al.  Multi-Level Video Frame Interpolation: Exploiting the Interaction Among Different Levels , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[11]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[12]  Ulrich Neumann,et al.  Stochastic Dynamics for Video Infilling , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[14]  Jinzhuo Wang,et al.  Long-term video interpolation with bidirectional predictive network , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[15]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[16]  Marco Tagliasacchi,et al.  From Here to There: Video Inbetweening Using Direct 3D Convolutions , 2019, ArXiv.

[17]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[22]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[24]  Guo-Jun Qi,et al.  Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities , 2017, International Journal of Computer Vision.

[25]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[26]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[27]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[28]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[30]  Kuniyuki Takahashi,et al.  Deep Visuo-Tactile Learning: Estimation of Tactile Properties from Images , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[31]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[32]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[33]  R. Chellappa,et al.  cGANs with Multi-Hinge Loss , 2019, ArXiv.

[34]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[35]  Max Grosse,et al.  Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.