Multi-view Body Part Recognition with Random Forests

Many computer vision tasks such as object detection, pose estimation,and alignment are directly related to the estimation of correspondences overinstances of an object class. Other tasks such as image classification andverification if not completely solved can largely benefit from correspondenceestimation. This thesis presents practical approaches for tackling the corre-spondence estimation problem with an emphasis on deformable objects.Different methods presented in this thesis greatly vary in details but theyall use a combination of generative and discriminative modeling to estimatethe correspondences from input images in an efficient manner. While themethods described in this work are generic and can be applied to any object,two classes of objects of high importance namely human body and faces arethe subjects of our experimentations.When dealing with human body, we are mostly interested in estimating asparse set of landmarks – specifically we are interested in locating the bodyjoints. We use pictorial structures to model the articulation of the body partsgeneratively and learn efficient discriminative models to localize the parts inthe image. This is a common approach explored by many previous works. Wefurther extend this hybrid approach by introducing higher order terms to dealwith the double-counting problem and provide an algorithm for solving theresulting non-convex problem efficiently. In another work we explore the areaof multi-view pose estimation where we have multiple calibrated cameras andwe are interested in determining the pose of a person in 3D by aggregating2D information. This is done efficiently by discretizing the 3D search spaceand use the 3D pictorial structures model to perform the inference.In contrast to the human body, faces have a much more rigid structureand it is relatively easy to detect the major parts of the face such as eyes,nose and mouth, but performing dense correspondence estimation on facesunder various poses and lighting conditions is still challenging. In a first workwe deal with this variation by partitioning the face into multiple parts andlearning separate regressors for each part. In another work we take a fullydiscriminative approach and learn a global regressor from image to landmarksbut to deal with insufficiency of training data we augment it by a large numberof synthetic images. While we have shown great performance on the standardface datasets for performing correspondence estimation, in many scenariosthe RGB signal gets distorted as a result of poor lighting conditions andbecomes almost unusable. This problem is addressed in another work wherewe explore use of depth signal for dense correspondence estimation. Hereagain a hybrid generative/discriminative approach is used to perform accuratecorrespondence estimation in real-time.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Michael Isard,et al.  Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[3]  Long Quan,et al.  Self-calibration of an affine camera from multiple views , 1996, International Journal of Computer Vision.

[4]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[5]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Stefan Carlsson,et al.  Motion capture from dynamic orthographic cameras , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[9]  Josephine Sullivan,et al.  Using Richer Models for Articulated Pose Estimation of Footballers , 2012, BMVC.

[10]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[11]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[12]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Christoph Schnörr,et al.  A Study of Parts-Based Object Class Detection Using Complete Graphs , 2010, International Journal of Computer Vision.

[14]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[15]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[16]  Adrian Hilton,et al.  Visual Analysis of Humans - Looking at People , 2013 .

[17]  Stefan Carlsson,et al.  3D Pictorial Structures for Multiple View Articulated Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[19]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[20]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.