Scene flow by tracking in intensity and depth data

The scene flow describes the motion of each 3D point between two times steps. With the arrival of new depth sensors, as the Microsfot Kinect, it is now possible to compute scene flow with a single camera, with promising repercussion in a wide range of computer vision scenarios. We propose a novel method to compute scene flow by tracking in a Lucas-Kanade framework. Scene flow is estimated using a pair of aligned intensity and depth images, but rather than computing a dense scene flow as in most previous methods, we get a set of 3D motion fields by tracking surface patches. Assuming a 3D local rigidity of the scene, we propose a rigid translation flow model that allows to solve directly for the scene flow by constraining the 3D motion field both in intensity and depth data. In our experimentation we achieve very encouraging results. Since this approach solves simultaneously for the 2D tracking and for the scene flow, it can be used for action recognition in existing 2D tracking based methods or to define scene flow descriptors.

[1]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Edmond Boyer,et al.  Scene Flow from Depth and Color Images , 2011, BMVC.

[3]  Yiannis Aloimonos,et al.  Spatio-Temporal Stereo Using Multi-Resolution Subdivision Surfaces , 2004, International Journal of Computer Vision.

[4]  Daniel Cremers,et al.  Efficient Dense Scene Flow from Sparse or Dense Stereo Data , 2008, ECCV.

[5]  Thomas S. Huang,et al.  Solving three-dimensional small-rotation motion equations: Uniqueness, algorithms, and numerical results , 1983, Computer Vision Graphics and Image Processing.

[6]  Bernd Jähne,et al.  Dense range flow from depth and intensity data , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[7]  Jean Ponce,et al.  Dense 3D motion capture from synchronized video streams , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Stefano Soatto,et al.  Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[9]  Yael Moses,et al.  Multi-view scene flow estimation: A view centered variational approach , 2010, CVPR.

[10]  Juho Kannala,et al.  Accurate and Practical Calibration of a Depth and Color Camera Pair , 2011, CAIP.

[11]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[14]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[15]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Konrad Schindler,et al.  3D scene flow estimation with a rigid motion prior , 2011, 2011 International Conference on Computer Vision.

[17]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[18]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  P. Green Iteratively reweighted least squares for maximum likelihood estimation , 1984 .

[20]  Frederic Devernay,et al.  A Variational Method for Scene Flow Estimation from Stereo Sequences , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[22]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[23]  Gilad Adiv,et al.  Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[25]  Y. Aloimonos,et al.  Spatio-Temporal Stereo Using Multi-Resolution Subdivision Surfaces , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[26]  Berthold K. P. Horn,et al.  Direct methods for recovering motion , 1988, International Journal of Computer Vision.

[27]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[28]  Richard Bowden,et al.  Kinecting the dots: Particle based scene flow from depth sensors , 2011, 2011 International Conference on Computer Vision.

[29]  Frederic Devernay,et al.  Multi-Camera Scene Flow by Tracking 3-D Points and Surfels , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[31]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[32]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[33]  Joachim Weickert,et al.  Joint Estimation of Motion, Structure and Geometry from Stereo Sequences , 2010, ECCV.

[34]  Robert B. Fisher,et al.  Colour Constrained 4D Flow , 2005, BMVC.

[35]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.