Detection and tracking of humans for visual interaction

This thesis contributes, in essence, four developments to the field of computer vision. The first two present independent methods of locating and tracking body parts of the human body, where the main interest is not 3D biometric accuracy, but rather a sufficient discriminatory representation for visual interaction. Making use of a single uncalibrated camera, the first algorithm employs background suppression and a general approximation to body shape, applied within a particle filter framework. In order to maintain real-time performance, integral images are used for rapid computation of particles. The second method presents a probabilistic framework of assembling detected human body parts into a full 2D human configuration. The face, torso, legs and hands are detected in cluttered scenes using body part detectors trained by AdaBoost. Coarse heuristics are applied to eliminate obvious outliers, and body configurations are assembled from the remaining parts using RANSAC. An a priori mixture model of upper-body configurations is used to provide a pose likelihood for each configuration, after which a joint-likelihood model is determined by combining the pose, part detector and corresponding skin model likelihoods; the assembly with the highest likelihood is selected. The third development is applied in conjunction with either of the aforementioned human body part detection and tracking techniques. Once the respective body parts have been located, the a priori mixture model of upper-body configurations is used to disambiguate the hands of the subject. Furthermore, the likely elbow positions are statistically estimated, thereby completing the upper body pose. A method of estimating the 3D pose of the upper human body from a single camera is presented in the final development. A database consisting of a variety of human movements is constructed from human motion capture data. This motion capture data is then used to animate a generic 3D human model which is rendered to produce a database of frontal view images. From this image database, three subsidiary databases consisting of hand positions, silhouettes and edge maps are extracted. The candidate image is then matched against these databases in real time. The index corresponding to the subsidiary database triplet that yields the highest matching score is used to extract the corresponding 3D configuration from the motion capture data. This motion capture frame is then used to extract the 3D positions of the hands for use in HCI, or to render a 3D model.

[1]  Georgios Tziritas,et al.  Face Detection Using Quantized Skin Color Regions Merging and Wavelet Packet Analysis , 1999, IEEE Trans. Multim..

[2]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  Björn Stenger,et al.  Hand Pose Estimation Using Hierarchical Detection , 2004, ECCV Workshop on HCI.

[6]  Richard Bowden,et al.  Real-time Upper Body 3D Pose Estimation from a Single Uncalibrated Camera , 2005, Eurographics.

[7]  David A. Forsyth,et al.  Probabilistic Methods for Finding People , 2001, International Journal of Computer Vision.

[8]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[10]  Ulf Grenander,et al.  Hands: A Pattern Theoretic Study of Biological Shapes , 1990 .

[11]  Michael J. Black,et al.  A framework for modeling the appearance of 3D articulated figures , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[12]  Hedvig Sidenbladh Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences , 2001 .

[13]  Gregory D. Hager,et al.  Joint probabilistic techniques for tracking objects using multiple visual cues , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[14]  Gunilla Borgefors,et al.  Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Chris Harris,et al.  Tracking with rigid models , 1993 .

[16]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[17]  Soon Ki Jung,et al.  Particle filter with analytical inference for human body tracking , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[18]  Tom Carey,et al.  ACM SIGCHI Curricula for Human-Computer Interaction , 1992 .

[19]  Hang-Bong Kang,et al.  Adaptive object tracking using bayesian network and memory , 2004, VSSN '04.

[20]  King Ngi Ngan,et al.  Face segmentation using skin-color map in videophone applications , 1999, IEEE Trans. Circuits Syst. Video Technol..

[21]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[22]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[23]  Alex Pentland Classification by Clustering , 1976 .

[24]  Cordelia Schmid,et al.  Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[25]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[26]  Richard Bowden,et al.  Learning Non-linear Models of Shape and Motion , 1999 .

[27]  Richard Szeliski,et al.  Tracking with Kalman snakes , 1993 .

[28]  Ioannis Pitas,et al.  Face localization and facial feature extraction based on shape and color information , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[29]  大野 義夫,et al.  Computer Graphics : Principles and Practice, 2nd edition, J.D. Foley, A.van Dam, S.K. Feiner, J.F. Hughes, Addison-Wesley, 1990 , 1991 .

[30]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[31]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[32]  Trevor Darrell,et al.  Background estimation and removal based on range and color , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[33]  Mathias Kölsch,et al.  Robust hand detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[34]  Raimondo Schettini,et al.  Pixel based skin colour classification exploiting explicit skin cluster definition methods , 2005 .

[35]  M. Carter Computer graphics: Principles and practice , 1997 .

[36]  Michael J. Black,et al.  Cardboard people: a parameterized model of articulated image motion , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[37]  Franklin C. Crow,et al.  Summed-area tables for texture mapping , 1984, SIGGRAPH.

[38]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Stephen J. McKenna,et al.  Human Pose Estimation Using Learnt Probabilistic Region Similarities and Partial Configurations , 2004, ECCV.

[40]  Andrew Blake,et al.  A sparse probabilistic learning algorithm for real-time tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[42]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[43]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[44]  Richard Bowden,et al.  View-based Location and Tracking of Body Parts for Visual Interaction , 2004, BMVC.

[45]  Tim J. Ellis,et al.  Image Difference Threshold Strategies and Shadow Detection , 1995, BMVC.

[46]  Adrian Hilton,et al.  Viewpoint invariant exemplar-based 3D human tracking , 2006, Comput. Vis. Image Underst..

[47]  Neil A. Thacker,et al.  The Bhattacharyya metric as an absolute similarity measure for frequency coded data , 1998, Kybernetika.

[48]  Peihua Li,et al.  Visual contour tracking based on particle filters , 2003, Image Vis. Comput..

[49]  F. van den Bergh,et al.  Software chroma keying in an immersive virtual environment , 1999 .

[50]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[53]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[54]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[55]  Ioannis Pitas,et al.  A novel method for automatic face segmentation, facial feature extraction and tracking , 1998, Signal Process. Image Commun..

[56]  I. Jolliffe Principal Component Analysis , 2002 .

[57]  William T. Freeman,et al.  Bayesian Reconstruction of 3D Human Motion from Single-Camera Video , 1999, NIPS.

[58]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[59]  Richard Bowden,et al.  Real-Time Upper Body Detection and 3D Pose Estimation in Monoscopic Images , 2006, ECCV.

[60]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  John D. Micheletti,et al.  Applying chroma-keying techniques in a virtual environment , 2000, Defense, Security, and Sensing.

[62]  P. Fearnhead,et al.  An improved particle filter for non-linear problems , 1999 .

[63]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[64]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[65]  Natan Peterfreund,et al.  Robust Tracking of Position and Velocity With Kalman Snakes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Mun Wai Lee,et al.  3D Body Reconstruction for Immersive Interaction , 2002, AMDO.

[67]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[68]  Timothy F. Cootes,et al.  A Multi-Stage Approach to Facial Feature Detection , 2004, BMVC.

[69]  Steve Wright Digital Compositing for Film and Video , 2001 .

[70]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[71]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[72]  Andrew Blake,et al.  A Probabilistic Exclusion Principle for Tracking Multiple Objects , 2000, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[73]  Richard Bowden,et al.  Jeremiah: the face of computer vision , 2002, SMARTGRAPH '02.

[74]  Khok K. Pang,et al.  Adaptive skin segmentation for head and shoulder video sequences , 2000, Visual Communications and Image Processing.

[75]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[77]  Ivan Laptev,et al.  Tracking of Multi-state Hand Models Using Particle Filtering and a Hierarchy of Multi-scale Image Features , 2001, Scale-Space.

[78]  S. L. Phung,et al.  A novel skin color model in YCbCr color space and its application to human face detection , 2002, Proceedings. International Conference on Image Processing.

[79]  Andrew Blake,et al.  A framework for spatiotemporal control in the tracking of visual contours , 1993, International Journal of Computer Vision.

[80]  R.A. Brooks,et al.  The Intelligent Room project , 1997, Proceedings Second International Conference on Cognitive Technology Humanizing the Information Age.

[81]  Y. Ivanov,et al.  Fast Lighting Independent Background Subtraction , 1998, Proceedings 1998 IEEE Workshop on Visual Surveillance.

[82]  Andrew W. Fitzgibbon,et al.  Dictionary of Computer Vision and Image Processing , 2005, J. Electronic Imaging.

[83]  Donald B. Gennery,et al.  Visual tracking of known three-dimensional objects , 1992, International Journal of Computer Vision.

[84]  Richard Bowden,et al.  A real time adaptive visual surveillance system for tracking low-resolution colour targets in dynamically changing scenes , 2003, Image Vis. Comput..

[85]  Richard Bowden,et al.  Detection and Tracking of Humans by Probabilistic Body Part Assembly , 2005, BMVC.

[86]  Larry S. Davis,et al.  Background modeling and subtraction by codebook construction , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[87]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..

[88]  Hans Lohninger,et al.  Teach/Me - Data Analysis , 1999 .

[89]  D.M. Gavrila,et al.  Vision-based pedestrian detection: the PROTECTOR system , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[90]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[91]  Ioannis Pitas,et al.  Facial feature extraction in frontal views using biometric analogies , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[92]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).