Computer Vision – ACCV 2018

We present a new stage-wise learning paradigm for training generative adversarial networks (GANs). The goal of our work is to progressively strengthen the discriminator and thus, the generators, with each subsequent stage without changing the network architecture. We call this proposed method the RankGAN. We first propose a marginbased loss for the GAN discriminator. We then extend it to a marginbased ranking loss to train the multiple stages of RankGAN. We focus on face images from the CelebA dataset in our work and show visual as well as quantitative improvements in face generation and completion tasks over other GAN approaches, including WGAN and LSGAN.

[1]  Brendan J. Frey,et al.  Video Epitomes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[5]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ralph R. Martin,et al.  Video completion using tracking and fragment merging , 2005, The Visual Computer.

[7]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Kazufumi Ito,et al.  Optimal Control Formulation for Determining Optical Flow , 2002, SIAM J. Sci. Comput..

[11]  Christine Guillemot,et al.  Video Inpainting With Short-Term Windows: Application to Object Removal and Error Concealment , 2015, IEEE Transactions on Image Processing.

[12]  Xiaochun Cao,et al.  Video Completion for Perspective Camera Under Constrained Motion , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[13]  Alan L. Yuille,et al.  Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[15]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Uwe Stilla,et al.  THERMAL 3D MAPPING FOR OBJECT DETECTION IN DYNAMIC SCENES , 2014 .

[18]  Jan Kautz,et al.  Background Inpainting for Videos with Dynamic Objects and a Free-Moving Camera , 2012, ECCV.

[19]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[20]  Seunghoon Hong,et al.  Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.

[21]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[22]  Tai-Pang Wu,et al.  Video repairing: inference of foreground and background under severe occlusion , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[23]  Jana Kosecka,et al.  Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks , 2016 .

[24]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[25]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[26]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Guillermo Sapiro,et al.  Video Inpainting Under Constrained Camera Motion , 2007, IEEE Transactions on Image Processing.

[28]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[29]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[30]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[31]  Marios Savvides,et al.  Spartans: Single-Sample Periocular-Based Alignment-Robust Recognition Technique Applied to Non-Frontal Scenarios , 2015, IEEE Transactions on Image Processing.

[32]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Patrick Pérez,et al.  Video Inpainting of Complex Scenes , 2014, SIAM J. Imaging Sci..

[35]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[36]  Michael Bosse,et al.  3D thermal mapping of building interiors using an RGB-D and thermal camera , 2013, 2013 IEEE International Conference on Robotics and Automation.

[37]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[38]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[39]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[42]  Dirk A. Lorenz,et al.  Image Sequence Interpolation Using Optimal Control , 2010, Journal of Mathematical Imaging and Vision.

[43]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[44]  Horst Bischof,et al.  Optical Flow Guided TV-L1 Video Interpolation and Restoration , 2011, EMMCVPR.

[45]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).