Monocular Model-Based 3D Tracking of Rigid Objects: A Survey

Many applications require tracking of complex 3D objects. These include visual servoing of robotic arms on specific target objects, Augmented Reality systems that require real-time registration of the object to be augmented, and head tracking systems that sophisticated interfaces can use. Computer Vision offers solutions that are cheap, practical and non-invasive.This survey reviews the different techniques and approaches that have been developed by industry and research. First, important mathematical tools are introduced: Camera representation, robust estimation and uncertainty estimation. Then a comprehensive study is given of the numerous approaches developed by the Augmented Reality and Robotics communities, beginning with those that are based on point or planar fiducial marks and moving on to those that avoid the need to engineer the environment by relying on natural features such as edges, texture or interest. Recent advances that avoid manual initialization and failures due to fast motion are also presented. The survery concludes with the different possible choices that should be made when implementing a 3D tracking system and a discussion of the future of vision-based 3D tracking.Because it encompasses many computer vision techniques from low-level vision to 3D geometry and includes a comprehensive study of the massive literature on the subject, this survey should be the handbook of the student, the researcher, or the engineer who wants to implement a 3D tracking system.

[1]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[2]  Takeo Kanade,et al.  Vision-Based Object Registration for Real-Time Image Overlay , 1995, CVRMed.

[3]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[4]  Gregory D. Hager,et al.  Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  J. G. Fryer,et al.  In-flight aerial camera calibration from photography of linear features , 1989 .

[8]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[9]  Eric Foxlin,et al.  Miniaturization, calibration & accuracy evaluation of a hybrid self-tracker , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..

[10]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  O. Faugeras Three-dimensional computer vision: a geometric viewpoint , 1993 .

[12]  Akio Kosaka,et al.  Vision-based motion tracking of frigid objects using prediction of uncertainties , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[13]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[14]  Y. Bar-Shalom Tracking and data association , 1988 .

[15]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using orthonormal matrices , 1988 .

[16]  Jun Rekimoto,et al.  Matrix: a realtime object identification and registration method for augmented reality , 1998, Proceedings. 3rd Asia Pacific Computer Human Interaction (Cat. No.98EX110).

[17]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[18]  David W. Murray,et al.  Head pose estimation for wearable robot control , 2002, BMVC.

[19]  Bruce A. Draper,et al.  Adaptive tracking and model registration across distinct aspects , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[20]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[22]  Dieter Schmalstieg,et al.  First steps towards handheld augmented reality , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[23]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[24]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[25]  Stephen J. Maybank,et al.  On plane-based camera calibration: A general algorithm, singularities, applications , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[26]  Rachid Deriche,et al.  Tracking line segments , 1990, Image Vis. Comput..

[27]  Michel Dhome,et al.  A simple and efficient template matching algorithm , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[28]  Hans-Hellmut Nagel,et al.  3D Pose Estimation by Directly Matching Polyhedral Models to Gray Value Gradients , 1997, International Journal of Computer Vision.

[29]  Adrian David Cheok,et al.  Online 6 DOF augmented reality registration from natural features , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[30]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[31]  Tom Drummond,et al.  Robust visual tracking for non-instrumental augmented reality , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..

[32]  David G. Lowe,et al.  Robust model-based motion tracking through the integration of search and estimation , 1992, International Journal of Computer Vision.

[33]  Frédéric Jurie,et al.  Tracking objects with a recognition algorithm , 1998, Pattern Recognit. Lett..

[34]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Paul A. Beardsley,et al.  Sequential Updating of Projective and Affine Structure from Motion , 1997, International Journal of Computer Vision.

[36]  Vincent Lepetit,et al.  Real-time nonrigid surface detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  G C Dean,et al.  An Introduction to Kalman Filters , 1986 .

[39]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[40]  Trevor Darrell,et al.  Location Estimation with a Differential Update Network , 2002, NIPS.

[41]  Richard A. Brown,et al.  Introduction to random signals and applied kalman filtering (3rd ed , 2012 .

[42]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Yakup Genc,et al.  Marker-less tracking for AR: a learning-based approach , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[44]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Rachid Deriche,et al.  A Robust Technique for Matching two Uncalibrated Images Through the Recovery of the Unknown Epipolar Geometry , 1995, Artif. Intell..

[46]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[47]  Long Quan,et al.  Linear N-Point Camera Pose Determination , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  F. Sebastian Grassia,et al.  Practical Parameterization of Rotations Using the Exponential Map , 1998, J. Graphics, GPU, & Game Tools.

[49]  Patrick Bouthemy,et al.  A 2D-3D model-based approach to real-time visual tracking , 2001, Image Vis. Comput..

[50]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Hans-Hellmut Nagel,et al.  Combination of Edge Element and Optical Flow Estimates for 3D-Model-Based Vehicle Tracking in Traffic Image Sequences , 1999, International Journal of Computer Vision.

[52]  Robert M. Haralick,et al.  Analysis and solutions of the three point perspective pose estimation problem , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Hirokazu Kato,et al.  Marker tracking and HMD calibration for a video-based augmented reality conferencing system , 1999, Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR'99).

[54]  Robert Laganière,et al.  Online estimation of trifocal tensors for augmenting live video , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[55]  Alex Pentland,et al.  Motion regularization for model-based head tracking , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[56]  Mark A. Livingston,et al.  Superior augmented reality registration by integrating landmark tracking and magnetic tracking , 1996, SIGGRAPH.

[57]  Shree K. Nayar,et al.  A perspective on distortions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[58]  Alex Pentland,et al.  Recursive Estimation of Motion, Structure, and Focal Length , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[60]  Michel Dhome,et al.  Hyperplane Approximation for Template Matching , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[62]  Laurent D. Cohen,et al.  Finite-Element Methods for Active Contour Models and Balloons for 2-D and 3-D Images , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Ronald Azuma,et al.  A Survey of Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[64]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[65]  Trevor Darrell,et al.  Reducing drift in parametric motion tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[66]  Oliver Bimber,et al.  Video see-through AR on consumer cell-phones , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[67]  Hans P. Morevec Towards automatic visual obstacle avoidance , 1977, IJCAI 1977.

[68]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[69]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[70]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71]  Andrew W. Fitzgibbon,et al.  Markerless tracking using planar structures in the scene , 2000, Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000).

[72]  Kostas Daniilidis,et al.  Omnidirectional video , 2003, The Visual Computer.

[73]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[74]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[75]  Ivan Poupyrev,et al.  Virtual object manipulation on a table-top AR environment , 2000, Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000).

[76]  Hiroshi Murase,et al.  Real-time 100 object recognition system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[77]  Khoi Nguyen,et al.  Computer-vision-based registration techniques for augmented reality , 1996, Other Conferences.

[78]  Larry S. Davis,et al.  Iterative Pose Estimation Using Coplanar Feature Points , 1996, Comput. Vis. Image Underst..

[79]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[80]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[81]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[82]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[83]  Michael Isard,et al.  A Smoothing Filter for CONDENSATION , 1998, ECCV.

[84]  Donald B. Gennery,et al.  Visual tracking of known three-dimensional objects , 1992, International Journal of Computer Vision.

[85]  Janne Heikkilä,et al.  A four-step camera calibration procedure with implicit image correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[86]  Marie-Odile Berger,et al.  Pose Estimation for Planar Structures , 2002, IEEE Computer Graphics and Applications.

[87]  David E. Breen,et al.  Real-time vision-based camera tracking for augmented reality applications , 1997, VRST '97.

[88]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[89]  Andrew Zisserman,et al.  Robust Object Tracking , 2001 .

[90]  Stephen M. Smith,et al.  SUSAN—A New Approach to Low Level Image Processing , 1997, International Journal of Computer Vision.

[91]  Thomas S. Huang,et al.  BOOK REVIEW: Calibration and Orientation of Cameras in Computer Vision , 2001 .

[92]  Hans-Hellmut Nagel,et al.  Model-based object tracking in monocular image sequences of road traffic scenes , 1993, International Journal of Computer 11263on.

[93]  James J. Little,et al.  Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks , 2002, Int. J. Robotics Res..

[94]  Patrick Bouthemy,et al.  Robust real-time visual tracking using a 2D-3D model-based approach , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[95]  Stefano Soatto,et al.  A semi-direct approach to structure from motion , 2003, The Visual Computer.

[96]  Reinhard Koch,et al.  Self-Calibration and Metric Reconstruction Inspite of Varying and Unknown Intrinsic Camera Parameters , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[97]  Roger Y. Tsai,et al.  A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses , 1987, IEEE J. Robotics Autom..

[98]  Shaogang Gong,et al.  Fusion of 2D face alignment and 3D head pose estimation for robust and real-time performance , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[99]  Luc Van Gool,et al.  Recognizing color patterns irrespective of viewpoint and illumination , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[100]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[101]  Stan Sclaroff,et al.  Active blobs , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[102]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[103]  Vincent Lepetit,et al.  Combining edge and texture information for real-time accurate 3D camera tracking , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[104]  Pascal Fua,et al.  Interaction techniques with virtual humans in mixed environments , 2002, 5th IEEE EMBS International Summer School on Biomedical Imaging, 2002..

[105]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[106]  Andrew W. Fitzgibbon,et al.  Automatic Camera Recovery for Closed or Open Image Sequences , 1998, ECCV.

[107]  Dimitris N. Metaxas,et al.  Shape and Nonrigid Motion Estimation Through Physics-Based Synthesis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[108]  Michael J. Black,et al.  Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion , 1997, International Journal of Computer Vision.

[109]  Hans P. Moravec Robot Rover Visual Navigation , 1981 .

[110]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[111]  Dimitris N. Metaxas,et al.  Optical Flow Constraints on Deformable Models with Applications to Face Tracking , 2000, International Journal of Computer Vision.

[112]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[113]  Zicheng Liu,et al.  Model-based bundle adjustment with application to face modeling , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[114]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[115]  Richard Szeliski,et al.  Recovering 3D Shape and Motion from Image Streams Using Nonlinear Least Squares , 1994, J. Vis. Commun. Image Represent..

[116]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[117]  Eric Foxlin,et al.  Circular data matrix fiducial system and robust image processing for a wearable vision-inertial self-tracker , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[118]  Andrew W. Fitzgibbon,et al.  Reliable Fiducial Detection in Natural Scenes , 2004, ECCV.

[119]  Roberto Cipolla,et al.  Real-Time Visual Tracking of Complex Structures , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[120]  Stefano Soatto,et al.  Structure from Motion Causally Integrated Over Time , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[121]  T. Lindeberg Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[122]  Ian D. Reid,et al.  Locally Planar Patch Features for Real-Time Structure from Motion , 2004, BMVC.

[123]  Ulrich Neumann,et al.  A multi-ring fiducial system and an intensity-invariant detection method for scalable augmented reality , 1999 .

[124]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[125]  David G. Lowe,et al.  Scene modelling, recognition and tracking with invariant image features , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[126]  Rachid Deriche,et al.  A computational approach for corner and vertex detection , 1993, International Journal of Computer Vision.

[127]  Éric Marchand,et al.  A real-time tracker for markerless augmented reality , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..

[128]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[129]  Marie-Odile Berger,et al.  A two-stage robust statistical method for temporal registration from features of various type , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[130]  Shree K. Nayar,et al.  Real-Time Focus Range Sensor , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[131]  Éric Marchand,et al.  Virtual Visual Servoing: a framework for real‐time augmented reality , 2002, Comput. Graph. Forum.