Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer

Video style transfer techniques inspire many exciting applications on mobile devices. However, their efficiency and stability are still far from satisfactory. To boost the transfer stability across frames, optical flow is widely adopted, despite its high computational complexity, e.g. occupying over 97% inference time. This paper proposes to learn a lightweight video style transfer network via knowledge distillation paradigm. We adopt two teacher networks, one of which takes optical flow during inference while the other does not. The output difference between these two teacher networks highlights the improvements made by optical flow, which is then adopted to distill the target student network. Furthermore, a low-rank distillation loss is employed to stabilize the output of student network by mimicking the rank of input videos. Extensive experiments demonstrate that our student network without an optical flow module is still able to generate stable video and runs much faster than the teacher network.

[1]  Li Fei-Fei,et al.  Characterizing and Improving Stability in Neural Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Dacheng Tao,et al.  Adversarial Learning of Portable Student Networks , 2018, AAAI.

[3]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[4]  Yue Wang,et al.  Consistent Video Style Transfer via Compound Regularization , 2020, AAAI.

[5]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[6]  Dacheng Tao,et al.  Learning Student Networks via Feature Embedding , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[8]  Junjie Yan,et al.  Quantization Mimic: Towards Very Tiny CNN for Object Detection , 2018, ECCV.

[9]  Simon Lucey,et al.  Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[11]  Nenghai Yu,et al.  StyleBank: An Explicit Representation for Neural Image Style Transfer , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xu Jia,et al.  Co-Evolutionary Compression for Unpaired Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Thomas Brox,et al.  Artistic Style Transfer for Videos , 2016, GCPR.

[15]  Yasin Almalioglu,et al.  Distilling Knowledge From a Deep Pose Regressor Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[17]  Hao Wang,et al.  Real-Time Neural Style Transfer for Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[19]  Qi Tian,et al.  Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Thomas Brox,et al.  Artistic Style Transfer for Videos and Spherical Images , 2017, International Journal of Computer Vision.

[22]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[23]  Gang Hua,et al.  Visual attribute transfer through deep image analogy , 2017, ACM Trans. Graph..

[24]  Feng Xu,et al.  A Closed-Form Solution to Universal Style Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Nenghai Yu,et al.  Coherent Online Video Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Chao Xu,et al.  Distilling portable Generative Adversarial Networks for Image Translation , 2020, AAAI.

[27]  Ming Lu,et al.  Decoder Network over Lightweight Reconstructed Feature for Fast Semantic Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Ming-Hsuan Yang,et al.  Universal Style Transfer via Feature Transforms , 2017, NIPS.

[29]  Qi Tian,et al.  CARS: Continuous Evolution for Efficient Neural Architecture Search , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shiyu Chang,et al.  AutoGAN: Neural Architecture Search for Generative Adversarial Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Jin Young Choi,et al.  Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[32]  Rynson W. H. Lau,et al.  Geometry-Aware Distillation for Indoor Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[34]  Jonathon Shlens,et al.  A Learned Representation For Artistic Style , 2016, ICLR.

[35]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Naiyan Wang,et al.  Like What You Like: Knowledge Distill via Neuron Selectivity Transfer , 2017, ArXiv.

[37]  Junjie Yan,et al.  Mimicking Very Efficient Network for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Changming Sun,et al.  Knowledge Adaptation for Efficient Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei Gao,et al.  Fast Video Multi-Style Transfer , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Yizhou Yu,et al.  ReCoNet: Real-time Coherent Video Style Transfer Network , 2018, ACCV.

[42]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[43]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[44]  C. Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[47]  Kai Han,et al.  Searching for Accurate Binary Neural Architectures , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).