A multi-camera method for three-dimensional digitization of dynamic, real-world events
暂无分享,去创建一个
This thesis presents a method for the 3D digitization of dynamic, real-world events. This task requires sufficient temporal and spatial sampling to capture the entirety of an event, as well as the estimation of 3D shape and appearance over time. Direct sensing of global 3D structure is not feasible because of the motion of the scene, and even range scanning systems usually sample too coarsely or too slowly to accurately capture dynamic events. This thesis presents a method of 3D digitization that overcomes this sensing problem through the use of a synchronized collection of a large number of calibrated video cameras.
Our 3D digitization method decomposes 3D shape recovery into the estimation of visible structure in each video frame followed by the integration of visible structure into a complete 3D model. Visible surfaces are extracted using the multi-baseline stereo (MBS) algorithm. This implementation of MBS efficiently supports any number cameras in general positions through a novel rectification strategy for general camera configurations that do not allow the rectification of all images to a single 3D plane. Stereo-computed range images are then integrated within a volumetric space using a novel integration strategy. Each range image is converted into a 3D mesh, and then into an implicit surface embedded in the volumetric space by encoding the signed distance between each 3D volume sample (voxel) and the 3D mesh. Multiple range images are integrated by accumulating the signed distance at each voxel. The resulting global surface model is then extracted by applying the Marching Cubes implicit-surface extraction algorithm.
The digitization of scene appearance uses the estimated 3D structure to determine visibility and sampling of color in the original video images. A compact, global texture map is computed by mixing the color estimates, emphasizing the color estimates obtained from cameras viewing the local surface structure most directly. Alternatively, an image-based representation is derived from the global model by reprojecting the global structure itself back into the original cameras to generate a visible surface model in each real camera. New views are synthesized by separately rendering three of these models and mixing the rendered views of the models directly on the virtual image plane.
This thesis also presents extensive results of digitizing real events recorded in the 3D Dome, a recording studio employing a synchronized collection of 51 calibrated video cameras mounted on a 5-meter diameter geodesic dome. This facility provides a workspace of approximately 8 cubic meters, sufficient to capture the entirety of motion of 1-2 people performing athletic actions including swinging a baseball bat, bumping a volleyball, and passing a basketball. Results are presented for 5 different events, each 1-2 seconds long. Video is captured at 240x320 image resolution, with a volumetric modeling resolution of 1 cubic center. The resulting models are used to generate both simple camera motions near the original camera viewpoints as well as camera motions deep into the event space and into views nearly impossible to capture with real cameras.