Deep Online Video Stabilization

Video stabilization technique is essential for most hand-held captured videos due to high-frequency shakes. Several 2D-, 2.5D- and 3D-based stabilization techniques are well studied, but to our knowledge, no solutions based on deep neural networks had been proposed. The reason for this is mostly the shortage of training data, as well as the challenge of modeling the problem using neural networks. In this paper, we solve the video stabilization problem using a convolutional neural network (ConvNet). Instead of dealing with offline holistic camera path smoothing based on feature matching, we focus on low-latency real-time camera path smoothing without explicitly representing the camera path. Our network, called StabNet, learns a transformation for each input unsteady frame progressively along the time-line, while creating a more stable latent camera path. To train the network, we create a dataset of synchronized steady/unsteady video pairs via a well designed hand-held hardware. Experimental results shows that the proposed online method (without using future frames) performs comparatively to traditional offline video stabilization methods, while running about 30 times faster. Further, the proposed StabNet is able to handle night-time and blurry videos, where existing methods fail in robust feature matching.

[1]  Hua Huang,et al.  Geodesic Video Stabilization in Transformation Space , 2017, IEEE Transactions on Image Processing.

[2]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Shang-Hong Lai,et al.  A robust real-time video stabilization algorithm , 2006, J. Vis. Commun. Image Represent..

[4]  Luc Van Gool,et al.  Deep Temporal Linear Encoding Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Johannes Kopf,et al.  360° video stabilization , 2016, ACM Trans. Graph..

[6]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[7]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Irfan A. Essa,et al.  Auto-directed video stabilization with robust L1 optimal camera paths , 2011, CVPR 2011.

[10]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[11]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Javier Sánchez Pérez,et al.  TV-L1 Optical Flow Estimation , 2013, Image Process. Line.

[13]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[14]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[18]  Raanan Fattal,et al.  Video stabilization using epipolar geometry , 2012, TOGS.

[19]  Bernhard Schölkopf,et al.  Online Video Deblurring via Dynamic Temporal Blending Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[21]  Jian Sun,et al.  MeshFlow: Minimum Latency Online Video Stabilization , 2016, ECCV.

[22]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Michael Gleicher,et al.  Content-preserving warps for 3D video stabilization , 2009, ACM Trans. Graph..

[24]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jian Sun,et al.  Bundled camera paths for video stabilization , 2013, ACM Trans. Graph..

[26]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[29]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[31]  Antonio Torralba,et al.  Generating Videos with Scene Dynamics , 2016, NIPS.

[32]  Guillermo Sapiro,et al.  Deep Video Deblurring , 2016, ArXiv.

[33]  Harry Shum,et al.  Full-frame video stabilization with motion inpainting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Hao Wang,et al.  Real-Time Neural Style Transfer for Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Irfan A. Essa,et al.  Calibration-free rolling shutter removal , 2012, 2012 IEEE International Conference on Computational Photography (ICCP).

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[38]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[39]  Michael Gleicher,et al.  Subspace video stabilization , 2011, TOGS.