Adaptive Hierarchical Motion-Focused Model for Video Prediction

Video prediction is a promising task in computer vision for many real-world applications and worth exploring. Most existing methods generate new frames based on appearance features with few constrain, which results in blurry predictions. Recently, some motion-focused methods are proposed to alleviate the problem. However, it’s difficult to capture the object motions from a video sequence and apply the learned motions to appearance, due to variety and complexity of real-world motions. In this paper, an adaptive hierarchical motion-focused model is introduced to predict realistic future frames. This model takes advantage of hierarchical motion modeling and adaptive transformation strategy, which can achieve better motion understanding and applying. We train our model end to end and employ the popular adversarial training to improve the quality of generations. Experiments on two challenging datasets: Penn Action and UCF101, demonstrate that the proposed model is effective and competitive with outstanding approaches.

[1]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[2]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[3]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  Weiyu Zhang,et al.  From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Bernhard Schölkopf,et al.  Flexible Spatio-Temporal Networks for Video Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Eric P. Xing,et al.  Dual Motion GAN for Future-Flow Embedded Video Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[11]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[12]  Antonio Torralba,et al.  Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Wenmin Wang,et al.  Learning Object-Centric Transformation for Video Prediction , 2017, ACM Multimedia.

[14]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.