Spacetime Stereo and 3D Flow via Binocular Spatiotemporal Orientation Analysis

This paper presents a novel approach to recovering estimates of 3D structure and motion of a dynamic scene from a sequence of binocular stereo images. The approach is based on matching spatiotemporal orientation distributions between left and right temporal image streams, which encapsulates both local spatial and temporal structure for disparity estimation. By capturing spatial and temporal structure in this unified fashion, both sources of information combine to yield disparity estimates that are naturally temporal coherent, while helping to resolve matches that might be ambiguous when either source is considered alone. Further, by allowing subsets of the orientation measurements to support different disparity estimates, an approach to recovering multilayer disparity from spacetime stereo is realized. Similarly, the matched distributions allow for direct recovery of dense, robust estimates of 3D scene flow. The approach has been implemented with real-time performance on commodity GPUs using OpenCL. Empirical evaluation shows that the proposed approach yields qualitatively and quantitatively superior estimates in comparison to various alternative approaches, including the ability to provide accurate multilayer estimates in the presence of (semi)transparent and specular surfaces.

[1]  Neil A. Dodgson,et al.  Real-Time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid , 2010, ECCV.

[2]  Kanad K. Biswas,et al.  A cooperative integration of stereopsis and optic flow computation , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[3]  Trevor Darrell,et al.  Using Multiple-Hypothesis Disparity Maps and Image Velocity for 3-D Motion Estimation , 2004, International Journal of Computer Vision.

[4]  Marc Pollefeys,et al.  Temporally Consistent Reconstruction from Multiple Video Streams Using Enhanced Belief Propagation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Andrei Zaharescu,et al.  Anomalous Behaviour Detection Using Spatiotemporal Oriented Energies, Subset Inclusion Histogram Comparison and Event-Driven Processing , 2010, ECCV.

[6]  Hujun Bao,et al.  3D Reconstruction of Dynamic Scenes with Multiple Handheld Cameras , 2012, ECCV.

[7]  Li Xu,et al.  Consistent Binocular Depth and Scene Flow with Chained Temporal Profiles , 2013, International Journal of Computer Vision.

[8]  Richard P. Wildes,et al.  Spatiotemporal Stereo and Scene Flow via Stequel Matching , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Richard P. Wildes,et al.  The Applicability of Spatiotemporal Oriented Energy Features to Region Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  M. Gelautz,et al.  Temporally consistent disparity maps from uncalibrated stereo videos , 2009, 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis.

[11]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Qionghai Dai,et al.  Multiview video depth estimation with spatial-temporal consistency , 2010, BMVC.

[13]  Olivier D. Faugeras,et al.  Variational stereovision and 3D scene flow estimation with statistical similarity measures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Daniel Cremers,et al.  Efficient Dense Scene Flow from Sparse or Dense Stereo Data , 2008, ECCV.

[15]  Michael G. Strintzis,et al.  Model-Based Joint Motion and Structure Estimation from Stereo Images , 1997, Comput. Vis. Image Underst..

[16]  James L. Crowley,et al.  A Probabilistic Sensor for the Perception and Recognition of Activities , 2000, ECCV.

[17]  Ye Zhang,et al.  On 3D scene flow and structure estimation , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  H. Scharr,et al.  A linear model for simultaneous estimation of 3D motion and depth , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[19]  Hujun Bao,et al.  Consistent depth maps recovery from a trinocular video sequence , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Wei Xiong,et al.  Stereo Matching on Objects with Fractional Boundary , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Szymon Rusinkiewicz,et al.  Spacetime stereo: a unifying framework for depth from triangulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  H. Knutsson,et al.  Estimating Multiple Depths in Semi-transparent Stereo Images , 1999 .

[23]  Richard O. Duda,et al.  Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[24]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[25]  Richard P. Wildes,et al.  Spatiotemporal oriented energies for spacetime stereo , 2011, 2011 International Conference on Computer Vision.

[26]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[27]  Michael Isard,et al.  Estimating disparity and occlusions in stereo video sequences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Richard Szeliski,et al.  Stereo matching with linear superposition of layers , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Uwe Franke,et al.  6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception , 2005, DAGM-Symposium.

[30]  Hans Knutsson,et al.  Signal processing for computer vision , 1994 .

[31]  Amnon Shashua,et al.  Direct estimation of motion and extended scene structure from a moving stereo rig , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[32]  Masahiko Shizawa,et al.  Direct estimation of multiple disparities for transparent multiple surfaces in binocular stereo , 1993, 1993 (4th) International Conference on Computer Vision.

[33]  Keith J. Hanna,et al.  Combining stereo and motion analysis for direct estimation of scene structure , 1993, 1993 (4th) International Conference on Computer Vision.

[34]  Steven S. Beauchemin,et al.  The computation of optical flow , 1995, CSUR.

[35]  Cheng Lei,et al.  A new multiview spacetime-consistent depth recovery framework for free viewpoint video rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Luc Van Gool,et al.  Motion - Stereo Integration for Depth Estimation , 2002, ECCV.

[37]  Hanno Scharr,et al.  Simultaneous motion, depth and slope estimation with a camera-grid , 2006 .

[38]  Shmuel Peleg,et al.  A Three-Frame Algorithm for Estimating Two-Component Image Motion , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Michael Isard,et al.  Dense Motion and Disparity Estimation Via Loopy Belief Propagation , 2006, ACCV.

[40]  Richard P. Wildes,et al.  Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Joachim Weickert,et al.  Joint Estimation of Motion, Structure and Geometry from Stereo Sequences , 2010, ECCV.

[42]  Hujun Bao,et al.  Consistent Depth Maps Recovery from a Video Sequence , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Jitendra Malik,et al.  Computational framework for determining stereo correspondence from a set of linear spatial filters , 1992, Image Vis. Comput..

[44]  P.V.C. Hough,et al.  Machine Analysis of Bubble Chamber Pictures , 1959 .

[45]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[46]  Li Zhang,et al.  Spacetime stereo: shape recovery for dynamic scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[47]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[48]  Changming Sun,et al.  An energy minimisation approach to stereo-temporal dense reconstruction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[49]  Richard P. Wildes,et al.  The Structure of Multiplicative Motions in Natural Imagery , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Yiannis Aloimonos,et al.  Spatio-Temporal Stereo Using Multi-Resolution Subdivision Surfaces , 2004, International Journal of Computer Vision.

[51]  Jitendra Malik,et al.  A Computational Framework for Determining Stereo Correspondence from a Set of Linear Spatial Filters , 1991, ECCV.

[52]  Richard P. Wildes,et al.  Coarse-to-fine stereo vision with accurate 3D boundaries , 2010, Image Vis. Comput..

[53]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[54]  Richard P. Wildes,et al.  Spacetime Texture Representation and Recognition Based on a Spatiotemporal Orientation Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[56]  Darius Burschka,et al.  Advances in Computational Stereo , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Miao Liao,et al.  Joint depth and alpha matte optimization via fusion of stereo and time-of-flight sensor , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  R. Wildes,et al.  Early spatiotemporal grouping with a distributed oriented energy representation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Minglun Gong Enforcing Temporal Consistency in Real-Time Stereo Estimation , 2006, ECCV.

[60]  Y. Aloimonos,et al.  Spatio-Temporal Stereo Using Multi-Resolution Subdivision Surfaces , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[61]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[62]  Yo-Sung Ho,et al.  Temporally Consistent Depth Map Estimation Using Motion Estimation for 3 DTV , 2009 .

[63]  Harpreet S. Sawhney,et al.  Correlation-based estimation of ego-motion and structure from motion and stereo , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[64]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.