PWStableNet: Learning Pixel-Wise Warping Maps for Video Stabilization

As the videos captured by hand-held cameras are often perturbed by high-frequency jitters, stabilization of these videos is an essential task. Many video stabilization methods have been proposed to stabilize shaky videos. However, most methods estimate one global homography or several homographies based on fixed meshes to warp the shaky frames into their stabilized views. Due to the existence of parallax, such single or a few homographies can not well handle the depth variation. In contrast to these traditional methods, we propose a novel video stabilization network, called PWStableNet, which comes up pixel-wise warping maps, i.e., potentially different warping for different pixels, and stabilizes each pixel to its stabilized view. To our best knowledge, this is the first deep learning based pixel-wise video stabilization. The proposed method is built upon a multi-stage cascade encoder-decoder architecture and learns pixel-wise warping maps from consecutive unstable frames. Inter-stage connections are also introduced to add feature maps of a former stage to the corresponding feature maps at a latter stage, which enables the latter stage to learn the residual from the feature maps of former stages. This cascade architecture can produce more precise warping maps at latter stages. To ensure the correct learning of pixel-wise warping maps, we use a well-designed loss function to guide the training procedure of the proposed PWStableNet. The proposed stabilization method achieves comparable performance with traditional methods, but stronger robustness and much faster processing speed. Moreover, the proposed stabilization method outperforms some typical CNN-based stabilization methods, especially in videos with strong parallax. Codes will be provided at https://github.com/mindazhao/pix-pix-warping-video-stabilization.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hailin Jin,et al.  Light field video stabilization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Dan Schonfeld,et al.  Robust Video Stabilization Based on Particle Filter Tracking of Projected Camera Motion , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Hideo Saito,et al.  Robust camera pose estimation by viewpoint classification using deep learning , 2017, Computational Visual Media.

[6]  Guillermo Sapiro,et al.  Deep Video Deblurring for Hand-Held Cameras , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Michael Gleicher,et al.  Content-preserving warps for 3D video stabilization , 2009, ACM Trans. Graph..

[8]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Michael Bosse,et al.  Non-metric image-based rendering for video stabilization , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hua Huang,et al.  Intrinsic Motion Stability Assessment for Video Stabilization , 2019, IEEE Transactions on Visualization and Computer Graphics.

[13]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[14]  Irfan A. Essa,et al.  Auto-directed video stabilization with robust L1 optimal camera paths , 2011, CVPR 2011.

[15]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[16]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[17]  Qiang Ling,et al.  A Fast Traffic Video Stabilization Method Based on Trajectory Derivatives , 2019, IEEE Access.

[18]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jian Sun,et al.  Bundled camera paths for video stabilization , 2013, ACM Trans. Graph..

[21]  Jian Sun,et al.  SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[23]  Shi-Min Hu,et al.  Deep Online Video Stabilization With Multi-Grid Warping Transformation Learning , 2019, IEEE Transactions on Image Processing.

[24]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[25]  Jun Hu,et al.  Deep Video Stabilization Using Adversarial Networks , 2018, Comput. Graph. Forum.

[26]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[27]  Ling Shao,et al.  A Local Structural Descriptor for Image Matching via Normalized Graph Laplacian Embedding , 2016, IEEE Transactions on Cybernetics.

[28]  Raanan Fattal,et al.  Video stabilization using epipolar geometry , 2012, TOGS.

[29]  Michal Irani,et al.  Multi-Frame Correspondence Estimation Using Subspace Constraints , 2002, International Journal of Computer Vision.

[30]  Chang-Su Kim,et al.  Video Stabilization Based on Feature Trajectory Augmentation and Selection and Robust Mesh Grid Warping , 2015, IEEE Transactions on Image Processing.

[31]  Michael Gleicher,et al.  Subspace video stabilization , 2011, TOGS.

[32]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[33]  Jian Sun,et al.  MeshFlow: Minimum Latency Online Video Stabilization , 2016, ECCV.

[34]  Michael F. Cohen,et al.  Real-time hyperlapse creation via optimal frame selection , 2015, ACM Trans. Graph..

[35]  Bing-Yu Chen,et al.  Capturing Intention‐based Full‐Frame Video Stabilization , 2008, Comput. Graph. Forum.

[36]  Jiajun Bu,et al.  Video stabilization with a depth camera , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Qiang Ling,et al.  Stabilization of Traffic Videos Based on Both Foreground and Background Feature Trajectories , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Michael Gleicher,et al.  Re-cinematography: Improving the camerawork of casual video , 2008, TOMCCAP.

[39]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Feng Liu,et al.  Spatially and Temporally Optimized Video Stabilization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[41]  Xuelong Li,et al.  A Feedback-Based Robust Video Stabilization Method for Traffic Videos , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Harry Shum,et al.  Full-frame video stabilization with motion inpainting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.