Multisensory embedded pose estimation

We present a multisensory method for estimating the transformation of a mobile phone between two images taken from its camera. Pose estimation is a necessary step for applications such as 3D reconstruction and panorama construction, but detecting and matching robust features can be computationally expensive. In this paper we propose a method for combining the inertial sensors (accelerometers and gyroscopes) of a mobile phone with its camera to provide a fast and accurate pose estimation. We use the inertial based pose to warp two images into the same perspective frame. We then employ an adaptive FAST feature detector and image patches, normalized with respect to illumination, as feature descriptors. After the warping the images are approximately aligned with each other so the search for matching key-points also becomes faster and in certain cases more reliable. Our results show that by incorporating the inertial sensors we can considerably speed up the process of detecting and matching key-points between two images, which is the most time consuming step of the pose estimation.

[1]  Richard Szeliski,et al.  Multi-image matching using multi-scale oriented patches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Marc Levoy,et al.  The Frankencamera: an experimental platform for computational photography , 2010, ACM Trans. Graph..

[3]  Wojciech Matusik,et al.  Practical motion capture in everyday surroundings , 2007, ACM Trans. Graph..

[4]  Marie-Odile Berger,et al.  Use of inertial sensors to support video tracking , 2007, Comput. Animat. Virtual Worlds.

[5]  Oliver J. Woodman,et al.  An introduction to inertial navigation , 2007 .

[6]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[7]  Didier Stricker,et al.  Advanced tracking through efficient image processing and visual-inertial sensor fusion , 2008, VR.

[8]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[9]  Tom Drummond,et al.  Fusing points and lines for high performance tracking , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Wei Wang,et al.  A SVD decomposition of essential matrix with eight solutions for the relative positions of two perspective cameras , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[11]  Yingen Xiong,et al.  Mobile panoramic imaging system , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[12]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[13]  Tom Drummond,et al.  Tightly integrated sensor fusion for robust visual tracking , 2004, Image Vis. Comput..

[14]  Jing Yang,et al.  Magic wand: a hand-drawn gesture input device in 3-D space with inertial sensors , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[15]  Didier Stricker,et al.  Advanced tracking through efficient image processing and visual-inertial sensor fusion , 2008, 2008 IEEE Virtual Reality Conference.

[16]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Thomas B. Schön,et al.  Robust real-time tracking by fusing measurements from inertial and vision sensors , 2007, Journal of Real-Time Image Processing.