Semantic Predictive Coding with Arbitrated Generative Adversarial Networks

In spatio-temporal predictive coding problems, like next-frame prediction in video, determining the content of plausible future frames is primarily based on the image dynamics of previous frames. We establish an alternative approach based on their underlying semantic information when considering data that do not necessarily incorporate a temporal aspect, but instead they comply with some form of associative ordering. In this work, we introduce the notion of semantic predictive coding by proposing a novel generative adversarial modeling framework which incorporates the arbiter classifier as a new component. While the generator is primarily tasked with the anticipation of possible next frames, the arbiter’s principal role is the assessment of their credibility. Taking into account that the denotative meaning of each forthcoming element can be encapsulated in a generic label descriptive of its content, a classification loss is introduced along with the adversarial loss. As supported by our experimental findings in a next-digit and a next-letter scenario, the utilization of the arbiter not only results in an enhanced GAN performance, but it also broadens the network’s creative capabilities in terms of the diversity of the generated symbols.

[1]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[2]  Roland Memisevic,et al.  Modeling Deep Temporal Dependencies with Recurrent "Grammar Cells" , 2014, NIPS.

[3]  R. Dobrushin Prescribing a System of Random Variables by Conditional Distributions , 1970 .

[4]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[5]  Philip S. Yu,et al.  PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs , 2017, NIPS.

[6]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[7]  Seunghoon Hong,et al.  Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.

[8]  Philip S. Yu,et al.  PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning , 2018, ICML.

[9]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  Karl J. Friston,et al.  Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[13]  S. Laughlin,et al.  Predictive coding: a fresh view of inhibition in the retina , 1982, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[14]  Roshan Rane,et al.  Video Action Classification Using PredNet , 2019, ArXiv.

[15]  Karl J. Friston,et al.  Predictive coding under the free-energy principle , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[16]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[17]  Abdulmotaleb El Saddik,et al.  Deep Learning in Next-Frame Prediction: A Benchmark Review , 2020, IEEE Access.

[18]  Roland Memisevic,et al.  Learning to Relate Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Tamara L. Berg,et al.  Learning Temporal Transformations from Time-Lapse Videos , 2016, ECCV.

[20]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[21]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[22]  Sandra Aigner,et al.  FUTUREGAN: ANTICIPATING THE FUTURE FRAMES OF VIDEO SEQUENCES USING SPATIO-TEMPORAL 3D CONVOLUTIONS IN PROGRESSIVELY GROWING GANS , 2018, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[23]  Bernhard Schölkopf,et al.  Flexible Spatio-Temporal Networks for Video Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Eric P. Xing,et al.  Dual Motion GAN for Future-Flow Embedded Video Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[26]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[27]  Antonio Torralba,et al.  Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  Geoffrey E. Hinton,et al.  Parallel visual computation , 1983, Nature.

[30]  Ruben Villegas,et al.  Hierarchical Long-term Video Prediction without Supervision , 2018, ICML.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[33]  Min-Gyu Park,et al.  Predicting Future Frames Using Retrospective Cycle GAN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[35]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[37]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[38]  Aggelos K. Katsaggelos,et al.  Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[39]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[40]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[41]  Chuan Li,et al.  Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks , 2016, ECCV.

[42]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[43]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[45]  Gaofeng Meng,et al.  Semantic Image Synthesis via Conditional Cycle-Generative Adversarial Networks , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[46]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  C. Aring,et al.  A CRITICAL REVIEW , 1939, Journal of neurology and psychiatry.

[48]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[49]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[50]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[51]  Yunbo Wang,et al.  Eidetic 3D LSTM: A Model for Video Prediction and Beyond , 2019, ICLR.

[52]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Gabriel Kreiman,et al.  Unsupervised Learning of Visual Structure using Predictive Generative Networks , 2015, ArXiv.

[54]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Karl J. Friston,et al.  Does predictive coding have a future? , 2018, Nature Neuroscience.

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Shunta Saito,et al.  Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[59]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[60]  Sukhendu Das,et al.  Temporal Coherency based Criteria for Predicting Video Frames using Deep Multi-stage Generative Adversarial Networks , 2017, NIPS.