All together now: Simultaneous Detection and Continuous Pose Estimation using a Hough Forest with Probabilistic Locally Enhanced Voting

Simultaneous object detection and pose estimation is a challenging task in computer vision. In this paper, we tackle the problem using Hough Forests. Unlike most methods in the literature, we focus on the problem of continuous pose estimation. Moreover, we aim for a probabilistic output. We first introduce a new pose purity criterion for splitting a node during the forest training. Second, we propose the concept of Probabilistic Locally Enhanced Voting (PLEV), a novel regression strategy which consists in modulating the regression with a kernel density estimation to consolidate the votes in a local region near the maxima detected in the Hough space. And third, we propose a pose-based backprojection strategy to improve the bounding box estimation. With these three additions, we show that our Hough Forest can achieve state-of-the-art results without needing 3D CAD models. We present a quite versatile method, showing results for different categories (cars as well as faces) and for different modalities (RGB as well as depth images).

[1]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ronen Basri,et al.  Viewpoint-aware object detection and continuous pose estimation , 2012, Image Vis. Comput..

[3]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[6]  Hayko Riemenschneider,et al.  Hough Regions for Joining Instance Localization and Segmentation , 2012, ECCV.

[7]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[12]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Silvio Savarese,et al.  Deformable part models revisited: A performance evaluation for object category pose estimation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[14]  Peter V. Gehler,et al.  3D2PM - 3D Deformable Part Models , 2012, ECCV.

[15]  Luc Van Gool,et al.  An Introduction to Random Forests for Multi-class Object Detection , 2011, Theoretical Foundations of Computer Vision.

[16]  Shigeru Shinomoto,et al.  Kernel bandwidth optimization in spike rate estimation , 2009, Journal of Computational Neuroscience.

[17]  Ahmed M. Elgammal,et al.  Regression from local features for viewpoint and pose estimation , 2011, 2011 International Conference on Computer Vision.

[18]  Tinne Tuytelaars,et al.  Is 2D Information Enough For Viewpoint Estimation? , 2014, BMVC.

[19]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[20]  Bodo Rosenhahn,et al.  Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.