Cascaded pose regression

We present a fast and accurate algorithm for computing the 2D pose of objects in images called cascaded pose regression (CPR). CPR progressively refines a loosely specified initial guess, where each refinement is carried out by a different regressor. Each regressor performs simple image measurements that are dependent on the output of the previous regressors; the entire system is automatically learned from human annotated training examples. CPR is not restricted to rigid transformations: ‘pose’ is any parameterized variation of the object's appearance such as the degrees of freedom of deformable and articulated objects. We compare CPR against both standard regression techniques and human performance (computed from redundant human annotations). Experiments on three diverse datasets (mice, faces, fish) suggest CPR is fast (2–3ms per pose estimate), accurate (approaching human performance), and easy to train from small amounts of labeled data.

[1]  Volker Graefe,et al.  Dynamic monocular machine vision , 1988, Machine Vision and Applications.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[5]  Pascal Fua,et al.  Joint pose estimator and feature learning for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[7]  Takeo Kanade,et al.  A robust shape model for multi-view car alignment , 2009, CVPR.

[8]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[9]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[10]  Pietro Perona,et al.  Monocular tracking of the human arm in 3D , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  D. Geman,et al.  Stationary Features and Cat Detection , 2008 .

[12]  Dorin Comaniciu,et al.  Image based regression using boosting method , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[14]  Yunde Jia Description and recognition of curved objects , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis,.

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[18]  Ankur Agarwal,et al.  3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[20]  David P. Helmbold,et al.  Boosting Methods for Regression , 2002, Machine Learning.

[21]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Roland Göcke,et al.  A Nonlinear Discriminative Approach to AAM Fitting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Pietro Perona,et al.  Pruning training sets for learning of object categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Fabien Moutarde,et al.  Real-time visual detection of vehicles and pedestrians with new efficient adaBoost features , 2008 .

[28]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.