Towards Optimal Non-rigid Surface Tracking

This paper addresses the problem of optimal alignment of non-rigid surfaces from multi-view video observations to obtain a temporally consistent representation. Conventional non-rigid surface tracking performs frame-to-frame alignment which is subject to the accumulation of errors resulting in drift over time. Recently, non-sequential tracking approaches have been introduced which re-order the input data based on a dissimilarity measure. One or more input sequences are represented in a tree with reducing alignment path length. This limits drift and increases robustness to large non-rigid deformations. However, jumps may occur in the aligned mesh sequence where tree branches meet due to independent error accumulation. Optimisation of the tree for non-sequential tracking is proposed to minimise the errors in temporal consistency due to both the drift and jumps. A novel cluster tree enforces sequential tracking in local segments of the sequence while allowing global non-sequential traversal among these segments. This provides a mechanism to create a tree structure which reduces the number of jumps between branches and limits the length of branches. Comprehensive evaluation is performed on a variety of challenging non-rigid surfaces including faces, cloth and people. This demonstrates that the proposed cluster tree achieves better temporal consistency than the previous sequential and non-sequential tracking approaches. Quantitative ground-truth comparison on a synthetic facial performance shows reduced error with the cluster tree.

[1]  Martin Klaudiny,et al.  Cooperative patch-based 3D surface tracking , 2011, 2011 Conference for Visual Media Production.

[2]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[3]  Jean Ponce,et al.  Dense 3D motion capture from synchronized video streams , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[5]  Takeo Kanade,et al.  Three-dimensional scene flow , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Slobodan Ilic,et al.  Free-form mesh tracking: A patch-based approach , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Hans-Peter Seidel,et al.  Animation cartography—intrinsic reconstruction of shape and motion , 2012, TOGS.

[8]  Steven M. Seitz,et al.  Spacetime faces , 2004, ACM Trans. Graph..

[9]  Martin Klaudiny,et al.  Global Non-rigid Alignment of Surface Sequences , 2013, International Journal of Computer Vision.

[10]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[11]  Y. Aloimonos,et al.  Spatio-Temporal Stereo Using Multi-Resolution Subdivision Surfaces , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[12]  Hans-Peter Seidel,et al.  Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data , 2009, TOGS.

[13]  Hongbin Zha,et al.  Computer Vision - ACCV 2009, 9th Asian Conference on Computer Vision, Xi'an, China, September 23-27, 2009, Revised Selected Papers, Part III , 2010, Asian Conference on Computer Vision.

[14]  Adrian Hilton,et al.  Global temporal registration of multiple non-rigid surface sequences , 2011, CVPR 2011.

[15]  Olivier D. Faugeras,et al.  Multi-View Stereo Reconstruction and Scene Flow Estimation with a Global Image-Based Matching Score , 2007, International Journal of Computer Vision.

[16]  Adrian Hilton,et al.  Automatic 3D Video Summarization: Key Frame Extraction from Self-Similarity , 2008 .

[17]  Jean-Philippe Pons,et al.  Dense and Accurate Spatio-temporal Multi-view Stereovision , 2009, ACCV.

[18]  Kiriakos N. Kutulakos,et al.  Multi-view scene capture by surfel sampling: from video streams to non-rigid 3D motion, shape and reflectance , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.