Brain-Inspired Inference on Missing Video Sequence

In this paper, we propose a novel end-to-end architecture that could generate a variety of plausible video sequences correlating two given discontinuous frames. Our work is inspired by the human ability of inference. Specifically, given two static images, human are capable of inferring what might happen in between as well as present diverse versions of their inference. We firstly train our model to learn the transformation to understand the movement trends within given frames. For the sake of imitating the inference of human, we introduce a latent variable sampled from Gaussian distribution. By means of integrating different latent variables with learned transformation features, the model could learn more various possible motion modes. Then applying these motion modes on the original frame, we could acquire various corresponding intermediate video sequence. Moreover, the framework is trained in adversarial fashion with unsupervised learning. Evaluating on the moving Mnist dataset and the 2D Shapes dataset, we show that our model is capable of imitating the human inference to some extent.