A real time system for robust 3D voxel reconstruction of human motions

We present a multi-PC/camera system that can perform 3D reconstruction and ellipsoid fitting of moving humans in real time. The system consists of five cameras. Each camera is connected to a PC which locally extracts the silhouettes of the moving person in the image captured by the camera. The five silhouette images are then sent, via local network, to a host computer to perform 3D voxel-based reconstruction by an algorithm called SPOT. Ellipsoids are then used to fit the reconstructed data. By using a simple and user-friendly interface, the user can display and observe, in real time and from any view-point, the 3D models of the moving human body. With a rate of higher than 15 frames per second, the system is able to capture non-intrusively, a sequence of human motions.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Jean Serra,et al.  Image Analysis and Mathematical Morphology , 1983 .

[3]  Michael Potmesil Generating octree models of 3D objects from their silhouettes in a sequence of images , 1987, Comput. Vis. Graph. Image Process..

[4]  Richard Szeliski,et al.  Rapid octree construction from image sequences , 1993 .

[5]  Edward Hunter,et al.  Vision based hand gesture interpretation using recursive estimation , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[6]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[7]  Ioannis A. Kakadiaris,et al.  3D human body model acquisition from multiple views , 1995, Proceedings of IEEE International Conference on Computer Vision.

[8]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[9]  Tosiyasu L. Kunii,et al.  Model-based analysis of hand posture , 1995, IEEE Computer Graphics and Applications.

[10]  James Quentin Stafford-Fraser Video-Augmented Environments , 1996 .

[11]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Jake K. Aggarwal,et al.  Tracking human motion using multiple cameras , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[13]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Takeo Kanade,et al.  Virtual ized reality: constructing time-varying virtual worlds from real world events , 1997 .

[15]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Saied Moezzi,et al.  Virtual View Generation for 3D Digital Video , 1997, IEEE Multim..

[17]  Takeo Kanade,et al.  Virtualized reality: constructing time-varying virtual worlds from real world events , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[18]  Emanuele Trucco,et al.  Introductory techniques for 3-D computer vision , 1998 .

[19]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[20]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Ying Wu,et al.  Capturing articulated human hand motion: a divide-and-conquer approach , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Zhengyou Zhang,et al.  Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  Yoichi Sato,et al.  Fast tracking of hands and fingertips in infrared images for augmented desk interface , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[26]  Y. Iwadate,et al.  Object segmentation for Z-keying using stereo images , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[27]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[28]  Vivek A. Sujan,et al.  Sign language recognition using competitive learning in the HAVNET neural network , 2000, Electronic Imaging.

[29]  Ivan Laptev,et al.  Tracking of Multi-state Hand Models Using Particle Filtering and a Hierarchy of Multi-scale Image Features , 2001, Scale-Space.

[30]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[31]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[32]  Bor-Shenn Jeng,et al.  Background subtraction based on logarithmic intensities , 2002, Pattern Recognit. Lett..

[33]  Yoichi Sato,et al.  Real-time tracking of multiple fingertips and gesture recognition for augmented desk interface systems , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.