论文信息 - CDFI: Compression-Driven Network Design for Frame Interpolation

CDFI: Compression-Driven Network Design for Frame Interpolation

DNN-based frame interpolation—that generates the intermediate frames given two consecutive frames—typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e.g., mobile devices. We propose a compression-driven network design for frame interpolation (CDFI), that leverages model pruning through sparsity-inducing optimization to significantly reduce the model size while achieving superior performance. Concretely, we first compress the recently proposed AdaCoF model and show that a 10× compressed AdaCoF performs similarly as its original counterpart; then we further improve this compressed model by introducing a multi-resolution warping module, which boosts visual consistencies with multi-level details. As a consequence, we achieve a significant performance gain with only a quarter in size compared with the original AdaCoF. Moreover, our model performs favorably against other state-of-the-arts in a broad range of datasets. Finally, the proposed compression-driven framework is generic and can be easily transferred to other DNN-based frame interpolation algorithm. Our source code is available at https://github.com/tding1/CDFI.

[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2] Thomas Brox,et al. Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Feng Liu,et al. Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Zhenzhong Chen,et al. Video Frame Interpolation via Deformable Separable Convolution , 2020, AAAI.

[5] Xiaoou Tang,et al. Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6] Rich Caruana,et al. Model compression , 2006, KDD '06.

[7] Tianyi Chen,et al. A Fast Reduced-Space Algorithmic Framework for Sparse Optimization , 2018 .

[8] Max Grosse,et al. Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[10] Max Welling,et al. Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[11] Xiaoyun Zhang,et al. Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Joshua B. Tenenbaum,et al. Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[13] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Jan Kautz,et al. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Feng Liu,et al. Softmax Splatting for Video Frame Interpolation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Xiao Tu,et al. Neural Network Compression Via Sparse Optimization , 2020, ArXiv.

[17] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19] Alain Trémeau,et al. Residual Conv-Deconv Grid Network for Semantic Segmentation , 2017, BMVC.

[20] Jitendra Malik,et al. View Synthesis by Appearance Flow , 2016, ECCV.

[21] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Ting-Chun Wang,et al. Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[23] Taeoh Kim,et al. AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[25] Qian Yin,et al. Quadratic video interpolation , 2019, NeurIPS.

[26] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[27] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[30] Zhiyong Gao,et al. MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[32] Radu Timofte,et al. AIM 2020 Challenge on Video Temporal Super-Resolution , 2020, ECCV Workshops.

[33] Joachim Weickert,et al. Motion Compensated Frame Interpolation with a Symmetric Optical Flow Constraint , 2012, ISVC.

[34] Richard Szeliski,et al. A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[35] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36] Hao Zhou,et al. Less Is More: Towards Compact CNNs , 2016, ECCV.

[37] Feng Liu,et al. Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38] Horst Bischof,et al. Optical Flow Guided TV-L1 Video Interpolation and Restoration , 2011, EMMCVPR.

[39] Yu Qiao,et al. Enhanced Quadratic Video Interpolation , 2020, ECCV Workshops.

[40] Cordelia Schmid,et al. DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[41] Lianghui Ding,et al. High-Order Model and Dynamic Filtering for Frame Rate Up-Conversion , 2018, IEEE Transactions on Image Processing.

[42] Zhihui Zhu,et al. Half-Space Proximal Stochastic Gradient Method for Group-Sparsity Regularized Problem , 2020, 2009.12078.

[43] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Chang-Su Kim,et al. BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation , 2020, ECCV.

[45] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.

[46] Wojciech Matusik,et al. Moving gradients: a path-based method for plausible image interpolation , 2009, ACM Trans. Graph..

[47] Kyoung Mu Lee,et al. Scene-Adaptive Video Frame Interpolation via Meta-Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Hongdong Li,et al. Learning Image Matching by Simply Watching Video , 2016, ECCV.

[49] Tomer Peleg,et al. IM-Net for High Resolution Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Luc Van Gool,et al. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Jun Chen,et al. Video Interpolation via Generalized Deformable Convolution , 2020, ArXiv.

[52] Jianbo Shi,et al. Zoom-In-To-Check: Boosting Video Interpolation via Instance-Level Discrimination , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[54] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[55] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56] Zheng Liu,et al. All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling , 2020, ECCV.

[57] Feng Liu,et al. Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58] Yung-Yu Chuang,et al. Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[59] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[60] Bohyung Han,et al. Channel Attention Is All You Need for Video Frame Interpolation , 2020, AAAI.

[61] John Flynn,et al. Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Jan Kautz,et al. Unsupervised Video Interpolation Using Cycle Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[63] Victor S. Lempitsky,et al. Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Markus H. Gross,et al. PhaseNet for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65] Zhenzhong Chen,et al. Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66] Stephen Lin,et al. Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).