High-Resolution Performance Capture by Zoom-in Pan-Tilt Cameras

We have developed a system with multiple pan-tilt cameras for capturing high-resolution videos of a moving person. This system controls the cameras so that each camera captures the best view of the person (i.e. one of body parts such as the head, torso, and limbs) based on criteria for camera-work optimization. For achieving this optimization in real time, time-consuming pre-processes, which give useful clues for the optimization, are performed in a training stage. Specifically, a target performance (e.g. a dance) is captured to acquire the configuration of the body parts at each frame. In a real capture stage, the system compares an online-reconstructed shape with those in the training data for fast retrieval of the configuration of the body parts. The retrieved configuration is used by an efficient scheme for optimizing a camera work. Experimental results show the camera work optimized in accordance with given criteria. A high-resolution 3D videos produced by the proposed system are also shown as a typical use of high-resolution videos.

[1]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[2]  Yasuhiro Mukaigawa,et al.  Cooperative Distributed Tracking for Effective Face Registration , 2000, MVA.

[3]  Masatsugu Kidode,et al.  Real-Time Shape Analysis of a Human Body in Clothing Using Time-Series Part-Labeled Volumes , 2008, ECCV.

[4]  Béla Ágai,et al.  CONDENSED 1,3,5-TRIAZEPINES - V THE SYNTHESIS OF PYRAZOLO [1,5-a] [1,3,5]-BENZOTRIAZEPINES , 1983 .

[5]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[6]  Masatsugu Kidode,et al.  An Assignment Scheme to Control Multiple Pan/Tilt Cameras for 3D Video , 2007, J. Multim..

[7]  Adrian Hilton,et al.  Correspondence labelling for wide-timeframe free-form surface matching , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Joachim Denzler,et al.  Information Theoretic Sensor Data Selection for Active Object Recognition and State Estimation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Radu Horaud,et al.  Topologically-robust 3D shape matching based on diffusion geometry and seed growing , 2011, CVPR 2011.

[10]  Takashi Matsuyama,et al.  Real-time multi-target tracking by cooperative distributed active vision agents , 2002, AAMAS '02.

[11]  Masatsugu Kidode,et al.  Complex volume and pose tracking with probabilistic dynamical models and visual hull constraints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[13]  Ioannis Pitas,et al.  3D Human Action Recognition for Multi-view Camera Systems , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[14]  Leonardo Trujillo,et al.  Visual learning of texture descriptors for facial expression recognition in thermal imagery , 2007, Comput. Vis. Image Underst..

[15]  Demetri Terzopoulos,et al.  Surveillance in Virtual Reality: System Design and Multi-Camera Control , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Adrian Hilton,et al.  Model-based human shape reconstruction from multiple views , 2008, Comput. Vis. Image Underst..

[17]  Greg Welch,et al.  A Stochastic Quality Metric for Optimal Control of Active Camera Network Configurations for 3 D Computer Vision Tasks , 2008 .

[18]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  Peter K. Allen,et al.  Design of a partitioned visual feedback controller , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[20]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[21]  Takeo Kanade,et al.  A real time system for robust 3D voxel reconstruction of human motions , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[22]  Takashi Matsuyama,et al.  Cell-based object tracking method for 3D shape reconstruction using multi-viewpoint active cameras , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[23]  Wolfgang Niem,et al.  Camera viewpoint control for the automatic reconstruction of 3D objects , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[24]  Radu Horaud,et al.  Topology-Adaptive Mesh Deformation for Surface Evolution, Morphing, and Multiview Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Franziska Meier,et al.  3D Shape Context and Distance Transform for action recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[26]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[28]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..