High-Quality Stereo Video Matching via User Interaction and Space-Time Propagation

Even current state-of-the-art automatic stereo matching methods often struggle on natural images and videos, in great part due to fundamental matching ambiguities in low texture regions and a lack of higher level object knowledge. Stereo image matching can benefit greatly from user input to guide the matching process and help disambiguate matches. Applying interactive correction tools from scratch on each frame of a video would not only be throwing away valuable information provided by the user on other frames, but would also likely be too time consuming to be practical for video even if excellent disparity results could be obtained within a few minutes on each frame. In this work, we propose a stereo video matching system that allows user interaction to obtain high quality, dense disparity maps on key frames and then intelligently propagates the user input and key frame disparities to automatically produce high quality disparity maps on intermediate frames. The disparity maps on key frames are obtained using several novel, easy-to-use, and effective interactive tools. Our novel propagation algorithm estimates 3D transformations that map user corrected areas in key frames to intermediate frames. Experiments demonstrate the effectiveness and efficiency of our hybrid interactive/automatic approach.

[1]  Tamir Hazan,et al.  Continuous Markov Random Fields for Robust Stereo Estimation , 2012, ECCV.

[2]  Trevor Darrell,et al.  Using Multiple-Hypothesis Disparity Maps and Image Velocity for 3-D Motion Estimation , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[3]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[4]  Edward H. Adelson,et al.  Human-assisted motion annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Luc Van Gool,et al.  Motion - Stereo Integration for Depth Estimation , 2002, ECCV.

[6]  Minglun Gong Real-time joint disparity and disparity flow estimation on programmable graphics hardware , 2009, Comput. Vis. Image Underst..

[7]  Daniel Cohen-Or,et al.  Semi-automatic stereo extraction from video footage , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Szymon Rusinkiewicz,et al.  Spacetime Stereo: A Unifying Framework for Depth from Triangulation , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[10]  Jian Sun,et al.  Guided Image Filtering , 2010, ECCV.

[11]  Sing Bing Kang,et al.  Depth Director: A System for Adding Depth to Movies , 2011, IEEE Computer Graphics and Applications.

[12]  Pushmeet Kohli,et al.  Object stereo — Joint stereo matching and object segmentation , 2011, CVPR 2011.

[13]  Trevor Darrell,et al.  Using Multiple-Hypothesis Disparity Maps and Image Velocity for 3-D Motion Estimation , 2004, International Journal of Computer Vision.

[14]  Takeo Kanade,et al.  Shape and motion carving in 6D , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15]  Markus Gross,et al.  Practical temporal consistency for image-based graphics applications , 2012, ACM Trans. Graph..

[16]  Marc Pollefeys,et al.  Temporally Consistent Reconstruction from Multiple Video Streams Using Enhanced Belief Propagation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  John K. Tsotsos,et al.  Applying temporal constraints to the dynamic stereo problem , 1986, Comput. Vis. Graph. Image Process..

[18]  Richard P. Wildes,et al.  Spatiotemporal Stereo and Scene Flow via Stequel Matching , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[20]  Miao Liao,et al.  Video Stereolization: Combining Motion Analysis with User Interaction , 2012, IEEE Transactions on Visualization and Computer Graphics.

[21]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[22]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[24]  Tsuhan Chen,et al.  Active learning for piecewise planar 3D reconstruction , 2011, CVPR 2011.

[25]  Pedro F. Felzenszwalb,et al.  Efficient belief propagation for early vision , 2004, CVPR 2004.

[26]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27]  Qingxiong Yang,et al.  A non-local cost aggregation method for stereo matching , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Michael G. Strintzis,et al.  Model-Based Joint Motion and Structure Estimation from Stereo Images , 1997, Comput. Vis. Image Underst..

[29]  Li Zhang,et al.  Spacetime stereo: shape recovery for dynamic scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[30]  Pushmeet Kohli,et al.  Surface stereo with soft segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Margrit Gelautz,et al.  Temporally Consistent Disparity and Optical Flow via Efficient Spatio-temporal Filtering , 2011, PSIVT.

[32]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Markus H. Gross,et al.  StereoBrush: interactive 2D to 3D conversion using discontinuous warps , 2011, SBIM '11.