Joint pose estimator and feature learning for object detection

A new learning strategy for object detection is presented. The proposed scheme forgoes the need to train a collection of detectors dedicated to homogeneous families of poses, and instead learns a single classifier that has the inherent ability to deform based on the signal of interest. Specifically, we train a detector with a standard AdaBoost procedure by using combinations of pose-indexed features and pose estimators instead of the usual image features. This allows the learning process to select and combine various estimates of the pose with features able to implicitly compensate for variations in pose. We demonstrate that a detector built in such a manner provides noticeable gains on two hand video sequences and analyze the performance of our detector as these data sets are synthetically enriched in pose while not increased in size.

[1]  Stan Sclaroff,et al.  Multiplicative kernels: Object detection, segmentation and pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  D. Geman,et al.  Efficient Focusing and Face Detection , 1998 .

[3]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  Björn Stenger,et al.  Estimating 3D hand pose using hierarchical multi-label classification , 2007, Image Vis. Comput..

[5]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Pierre-Yves Burgi,et al.  A 128 /spl times/ 128 pixel 120 dB dynamic range vision sensor chip for image contrast and orientation extraction , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[9]  D. Geman,et al.  Stationary Features and Cat Detection , 2008 .

[10]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Vicki Bruce,et al.  Face Recognition: From Theory to Applications , 1999 .

[12]  Björn Stenger,et al.  Filtering using a tree-based estimator , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Mathias Kölsch,et al.  Analysis of rotational robustness of hand detection with a Viola-Jones detector , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[14]  Harry Shum,et al.  Statistical Learning of Multi-view Face Detection , 2002, ECCV.

[15]  Paul A. Viola,et al.  Fast Multi-view Face Detection , 2003 .

[16]  Dorin Comaniciu,et al.  Joint Real-time Object Detection and Pose Estimation Using Probabilistic Boosting Network , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.