Markerless structure-based multi-sensor calibration for free viewpoint video capture

Free-viewpoint capture technologies have recently started demonstrating impressive results. Being able to capture human performances in full 3D is a very promising technology for a variety of applications. However, the setup of the capturing infrastructure is usually expensive and requires trained personnel. In this work we focus on one practical aspect of setting up a free-viewpoint capturing system, the spatial alignment of the sensors. Our work aims at simplifying the external calibration process that typically requires significant human intervention and technical knowledge. Our method uses an easy to assemble structure and unlike similar works, does not rely on markers or features. Instead, we exploit the a-priori knowledge of the structure’s geometry to establish correspondences for the little-overlapping viewpoints typically found in free-viewpoint capture setups. These establish an initial sparse alignment that is then densely optimized. At the same time, our pipeline improves the robustness to assembly errors, allowing for non-technical users to calibrate multi-sensor setups. Our results showcase the feasibility of our approach that can make the tedious calibration process easier, and less error-prone.

[1]  Bernd Fröhlich,et al.  Sweeping-based volumetric calibration and registration of multiple RGBD-sensors for 3D capturing systems , 2017, 2017 IEEE Virtual Reality (VR).

[2]  Marek Kowalski,et al.  Livescan3D: A Fast and Inexpensive 3D Data Acquisition System for Multiple Kinect v2 Sensors , 2015, 2015 International Conference on 3D Vision.

[3]  Xiaoyang Liu,et al.  Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera , 2017, ACM Trans. Graph..

[4]  Reinhard Koch,et al.  Pose Estimation from Line Correspondences: A Complete Analysis and a Series of Solutions , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Roland Siegwart,et al.  Unified temporal and spatial calibration for multi-sensor systems , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[7]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[8]  Roberto Cipolla,et al.  Research data supporting “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization”: St Marys Church , 2015 .

[9]  Jana Kosecka,et al.  Fast Single Shot Detection and Pose Estimation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[10]  Hideo Saito,et al.  Robust camera pose estimation by viewpoint classification using deep learning , 2017, Computational Visual Media.

[11]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Bernd Fröhlich,et al.  Immersive Group-to-Group Telepresence , 2013, IEEE Transactions on Visualization and Computer Graphics.

[13]  Esa Rahtu,et al.  Relative Camera Pose Estimation Using Convolutional Neural Networks , 2017, ACIVS.

[14]  Petros Daras,et al.  3D tele-immersion platform for interactive immersive experiences between remote users , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[15]  D. Kendall A Survey of the Statistical Theory of Shape , 1989 .

[16]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[17]  Alvaro Collet,et al.  High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[18]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, CVPR.

[19]  Qionghai Dai,et al.  Free-Viewpoint Video of Human Actors Using Multiple Handheld Kinects , 2013, IEEE Transactions on Cybernetics.

[20]  Jitendra Malik,et al.  Generic 3D Representation via Pose Estimation and Matching , 2016, ECCV.

[21]  Charles T. Loop,et al.  Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[22]  Henry Fuchs,et al.  Temporally enhanced 3D capture of room-sized dynamic scenes with commodity depth cameras , 2014, 2014 IEEE Virtual Reality (VR).

[23]  Roland Siegwart,et al.  Comparing ICP variants on real-world data sets , 2013, Auton. Robots.

[24]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[25]  Wolfram Burgard,et al.  G2o: A general framework for graph optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26]  Edmond Boyer,et al.  Automatic Camera Calibration Using Multiple Sets of Pairwise Correspondences , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Bernd Fröhlich,et al.  Volumetric calibration and registration of multiple RGBD-sensors into a joint coordinate system , 2015, 2015 IEEE Symposium on 3D User Interfaces (3DUI).

[28]  Qionghai Dai,et al.  Video-Based Outdoor Human Reconstruction , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Jianfei Cai,et al.  Registration of multiple RGBD cameras via local rigid transformations , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[30]  Dieter Schmalstieg,et al.  OmniKinect: real-time dense volumetric data acquisition and applications , 2012, VRST '12.

[31]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[32]  Marcus A. Magnor,et al.  External Camera Calibration for Synchronized Multi-video Systems , 2004, WSCG.

[33]  Kostas Daniilidis,et al.  MSG-cal: Multi-sensor graph-based calibration , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[36]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[37]  Benjamin Busam,et al.  Fusion 4 D : Real-time Performance Capture of Challenging Scene Seminar : Recent Trends in 3 D Computer Vision , 2016 .

[38]  Shahram Izadi,et al.  Motion2fusion , 2017, ACM Trans. Graph..

[39]  Mariolino De Cecco,et al.  Automatic graph based spatiotemporal extrinsic calibration of multiple Kinect V2 ToF cameras , 2017, Robotics Auton. Syst..

[40]  Didier Stricker,et al.  CoRBS: Comprehensive RGB-D benchmark for SLAM using Kinect v2 , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[42]  Petros Daras,et al.  An Integrated Platform for Live 3D Human Reconstruction and Motion Capturing , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.