Background Inpainting for Videos with Dynamic Objects and a Free-Moving Camera

We propose a method for removing marked dynamic objects from videos captured with a free-moving camera, so long as the objects occlude parts of the scene with a static background. Our approach takes as input a video, a mask marking the object to be removed, and a mask marking the dynamic objects to remain in the scene. To inpaint a frame, we align other candidate frames in which parts of the missing region are visible. Among these candidates, a single source is chosen to fill each pixel so that the final arrangement is color-consistent. Intensity differences between sources are smoothed using gradient domain fusion. Our frame alignment process assumes that the scene can be approximated using piecewise planar geometry: A set of homographies is estimated for each frame pair, and one each is selected for aligning pixels such that the color-discrepancy is minimized and the epipolar constraints are maintained. We provide experimental validation with several real-world video sequences to demonstrate that, unlike in previous work, inpainting videos shot with free-moving cameras does not necessarily require estimation of absolute camera positions and per-frame per-pixel depth maps.

[1]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, SIGGRAPH 2009.

[4]  Jian Zhao,et al.  Efficient Object-Based Video Inpainting , 2006, 2006 International Conference on Image Processing.

[5]  Guillermo Sapiro,et al.  Video Inpainting Under Constrained Camera Motion , 2007, IEEE Transactions on Image Processing.

[6]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[7]  Maneesh Agrawala,et al.  Using Photographs to Enhance Videos of a Static Scene , 2007, Rendering Techniques.

[8]  Luc Van Gool,et al.  Surviving Dominant Planes in Uncalibrated Structure and Motion Recovery , 2002, ECCV.

[9]  Jenq-Neng Hwang,et al.  Exemplar-Based Video Inpainting Without Ghost Shadow Artifacts by Maintaining Temporal Continuity , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Yizhou Yu,et al.  Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping , 1998, Rendering Techniques.

[11]  Timothy K. Shih,et al.  Video falsifying by motion interpolation and inpainting , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[13]  David A. Forsyth,et al.  Generalizing motion edits with Gaussian processes , 2009, ACM Trans. Graph..

[14]  Lihi Zelnik-Manor,et al.  Multiview Constraints on Homographies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Xiaochun Cao,et al.  Video Completion for Perspective Camera Under Constrained Motion , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[16]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[17]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[19]  Deepu Rajan,et al.  Hybrid shift map for video retargeting , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  H S TorrPhilip,et al.  The Problem of Degeneracy in Structure and Motion Recovery from Uncalibrated Image Sequences , 1999 .

[21]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[22]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Tai-Pang Wu,et al.  Video repairing under variable illumination using cyclic motions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Oliver Grau,et al.  How Not to Be Seen — Object Removal from Videos of Crowded Scenes , 2012, Comput. Graph. Forum.

[25]  Irfan A. Essa,et al.  Tree-based Classifiers for Bilayer Video Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Harry Shum,et al.  Review of image-based rendering techniques , 2000, Visual Communications and Image Processing.

[28]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[29]  Irfan A. Essa,et al.  Graphcut textures: image and video synthesis using graph cuts , 2003, ACM Trans. Graph..

[30]  Yong-Sheng Chen,et al.  Video object inpainting using posture mapping , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[31]  Andrew W. Fitzgibbon,et al.  The Problem of Degeneracy in Structure and Motion Recovery from Uncalibrated Image Sequences , 1999, International Journal of Computer Vision.

[32]  Guillermo Sapiro,et al.  Video inpainting of occluding and occluded objects , 2005, IEEE International Conference on Image Processing 2005.

[33]  Eli Shechtman,et al.  Space-Time Completion of Video , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.