Feature Harvesting for Tracking-by-Detection

We propose a fast approach to 3–D object detection and pose estimation that owes its robustness to a training phase during which the target object slowly moves with respect to the camera. No additional information is provided to the system, save a very rough initialization in the first frame of the training sequence. It can be used to detect the target object in each video frame independently. Our approach relies on a Randomized Tree-based approach to wide-baseline feature matching. Unlike previous classification-based appro- aches to 3–D pose estimation, we do not require an a priori 3–D model. Instead, our algorithm learns both geometry and appearance. In the process, it collects, or harvests, a list of features that can be reliably recognized even when large motions and aspect changes cause complex variations of feature appearances. This is made possible by the great flexibility of Randomized Trees, which lets us add and remove feature points to our list as needed with a minimum amount of extra computation.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[3]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  James J. Little,et al.  Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks , 2002, Int. J. Robotics Res..

[5]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Wolfgang Heidrich,et al.  Cloth Motion Capture , 2003, SIGGRAPH '03.

[7]  David G. Lowe,et al.  Scene modelling, recognition and tracking with invariant image features , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Rakesh Gupta,et al.  Multiple View Feature Descriptors from Image Sequences via Kernel Principal Component Analysis , 2004, ECCV.

[10]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Vincent Lepetit,et al.  Monocular Model-Based 3D Tracking of Rigid Objects: A Survey , 2005, Found. Trends Comput. Graph. Vis..

[12]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[13]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Vincent Lepetit,et al.  Monocular Model-based 3d Tracking of Rigid Objects (Foundations and Trends in Computer Graphics and Vision(R)) , 2005 .

[16]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.