Multi-scale feature tracking and motion estimation

This thesis studies the problems of feature tracking and motion estimation and presents an application of these concepts to human-computer interaction. The presentation is divided into three parts. The first part addresses feature tracking in a multi-scale context. Features in an image appear at different scales, and these scales can be expected to change over time due to the size variations that occur when objects move relative to the camera. A scheme for feature tracking is presented, which incorporates a mechanism for automatic scale selection and it is argued that such a mechanism is necessary to handle size variations over time. Experiments demonstrate how the proposed scheme is robust to size variations in situations where a traditional fixed scale tracker fails. This leads to extended feature trajectories, which are valuable for motion and structure estimation. It is also shown how an object representation suitable for tracking can be built in a conceptually simple way as a multi-scale feature hierarchy with qualitative relations between features at different scales. Experiments illustrate the capability of the proposed hierarchy to handle occlusions and semirigid objects. The second part of the thesis develops a geometric framework for computing estimates of 3D structure and motion from sparse feature correspondences in monocular sequences. A tool is presented, called the centered affine trifocal tensor, for motion estimation from three affine views. Moreover, a factorization approach is developed which simultaneously handles point and line correspondences in multiple affine views. Experiments show the influence of several factors on the accuracy of the structure and motion estimates, including noise in the feature localization, perspective effects and the number of feature correspondences. This motion estimation framework is also applied to feature correspondences obtained from the abovementioned feature tracker. The last part integrates the functionalities from the first two parts into a pre-prototype system which explores new principles for human-computer interaction. The idea is to transfer 3D orientation to a computer using no other equipment than the operator’s hand.

[1]  David C. Hogg Model-based vision: a program to see a walking person , 1983, Image Vis. Comput..

[2]  James L. Crowley,et al.  A Representation for Shape Based on Peaks and Ridges in the Difference of Low-Pass Transform , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  W. Freeman,et al.  Bayesian Estimation of 3-D Human Motion , 1998 .

[4]  Alex Pentland,et al.  Task-Specific Gesture Analysis in Real-Time Using Interpolated Views , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Joachim Weickert,et al.  Anisotropic diffusion in image processing , 1996 .

[6]  A. Heyden Geometry and algebra of multiple projective transformations , 1995 .

[7]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[8]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Stephen M. Pizer,et al.  A Multiresolution Hierarchical Approach to Image Segmentation Based on Intensity Extrema , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Gang Xu,et al.  Epipolar Geometry in Stereo, Motion and Object Recognition , 1996, Computational Imaging and Vision.

[11]  Roberto Cipolla,et al.  Human-robot interface by pointing with uncalibrated stereo vision , 1996, Image Vis. Comput..

[12]  Max A. Viergever,et al.  Probabilistic Multiscale Image Segmentation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Rachid Deriche,et al.  Tracking line segments , 1990, Image Vis. Comput..

[14]  Stephen M. Smith,et al.  ASSET-2: Real-Time Motion Segmentation and Shape Tracking , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Francis K. H. Quek Eyes in the interface , 1995, Image Vis. Comput..

[16]  William T. Freeman,et al.  Television control by hand gestures , 1994 .

[17]  Tony Lindeberg,et al.  Scale-Space for Discrete Signals , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Long Quan,et al.  Geometry of Multiple Affine Views , 1998, SMILE.

[19]  Max A. Viergever,et al.  Families of Tuned Scale-Space Kernels , 1992, ECCV.

[20]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[21]  Justine Cassell,et al.  Temporal classification of natural gesture and application to video coding , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Anders Heyden,et al.  Reconstruction from image sequences by means of relative depths , 1995, Proceedings of IEEE International Conference on Computer Vision.

[23]  Thomas S. Huang,et al.  Vision based hand modeling and tracking for virtual teleconferencing and telecollaboration , 1995, Proceedings of IEEE International Conference on Computer Vision.

[24]  Ali Shokoufandeh,et al.  View-based object matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[25]  Franc Solina,et al.  Automatic reconstruction of 3D human arm motion from a monocular image sequence , 1998, Machine Vision and Applications.

[26]  Andrew Blake,et al.  Surface Orientation and Time to Contact from Image Divergence and Deformation , 1992, ECCV.

[27]  Andrea J. van Doorn,et al.  Two-plus-one-dimensional differential geometry , 1994, Pattern Recognition Letters.

[28]  David J. Kriegman,et al.  What is the set of images of an object under all possible lighting conditions? , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Thomas S. Huang,et al.  Motion and structure from orthographic projections , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[30]  Olivier D. Faugeras,et al.  On the geometry and algebra of the point and line correspondences between N images , 1995, Proceedings of IEEE International Conference on Computer Vision.

[31]  Alex Pentland,et al.  Dynamic models of human motion , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[32]  D Marr,et al.  A computational theory of human stereo vision. , 1979, Proceedings of the Royal Society of London. Series B, Biological sciences.

[33]  Anders Heyden,et al.  Robust self-calibration and Euclidean reconstruction via affine approximation , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[34]  Narendra Ahuja,et al.  A multiscale region detector , 1989, Comput. Vis. Graph. Image Process..

[35]  Roberto Cipolla,et al.  Fast visual tracking by temporal consensus , 1996, Image Vis. Comput..

[36]  Lars Bretzner,et al.  Förfarande och anordning för överföring av information genom rörelsedetektering, samt användning av anordningen : [Method and arrangement for controlling means for three-dimensional transfer of information by motion detection] , 1998 .

[37]  John K. Tsotsos,et al.  An Attentional Prototype for Early Vision , 1992, ECCV.

[38]  Robert M. Haralick,et al.  Ridges and valleys on digital images , 1983, Comput. Vis. Graph. Image Process..

[39]  Amnon Shashua,et al.  Trilinear Tensor: The Fundamental Construct of Multiple-view Geometry and Its Applications , 1997, AFPAC.

[40]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[41]  J J Koenderink,et al.  Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[42]  Kentaro Toyama,et al.  Tracking Objects By Color Alone , 1996 .

[43]  Ian D. Reid,et al.  Recursive Affine Structure and Motion from Image Sequences , 1994, ECCV.

[44]  Peter F. Sturm,et al.  A Factorization Based Algorithm for Multi-Image Projective Structure and Motion , 1996, ECCV.

[45]  Thomas S. Huang,et al.  Image processing , 1971 .

[46]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[47]  Rachid Deriche,et al.  Accurate corner detection: an analytical study , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[48]  Shahriar Negahdaripour,et al.  Revised Definition of Optical Flow: Integration of Radiometric and Geometric Cues for Dynamic Scene Analysis , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Zhengyou Zhang,et al.  Token tracking in a cluttered scene , 1994, Image Vis. Comput..

[50]  Yoshiaki Shirai,et al.  Hand gesture estimation and model refinement using monocular camera-ambiguity limitation by inequality constraints , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[51]  Vladimir Pavlovic,et al.  Hand Gesture Modeling, Analysis, and Synthesis , 1995 .

[52]  Ross T. Whitaker,et al.  A multi-scale approach to nonuniform diffusion , 1993 .

[53]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[54]  Han Wang,et al.  A Matching and Tracking Strategy for Independently Moving Objects , 1992, BMVC.

[55]  Tony Lindeberg,et al.  Edge Detection and Ridge Detection with Automatic Scale Selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  Gregory D. Hager,et al.  Real-time tracking of image regions with changes in geometry and illumination , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[58]  Marc E. Brown,et al.  Patent pending , 1995 .

[59]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[60]  Jochen Triesch,et al.  Robust classification of hand postures against complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[61]  Amnon Shashua,et al.  Algebraic Functions For Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  C. Maggioni,et al.  Computer Vision for Human–Machine Interaction: GestureComputer – History, Design and Applications , 1998 .

[63]  Michael J. Black,et al.  The Digital Office: Overview , 1998 .

[64]  Andrew Blake,et al.  Affine-invariant contour tracking with automatic control of spatiotemporal scale , 1993, 1993 (4th) International Conference on Computer Vision.

[65]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[66]  P. Lions,et al.  Axioms and fundamental equations of image processing , 1993 .

[67]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[68]  Harry Wechsler,et al.  Detection and localization of objects in time-varying imagery using attention, representation and memory pyramids , 1996, Pattern Recognit..

[69]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  Jitendra Malik,et al.  Robust Multiple Car Tracking with Occlusion Reasoning , 1994, ECCV.

[71]  Tony Lindeberg,et al.  Principles for Automatic Scale Selection , 1999 .

[72]  Thomas S. Huang,et al.  Motion and structure from feature correspondences: a review , 1994, Proc. IEEE.

[73]  Tony Lindeberg,et al.  Linear Spatio-Temporal Scale-Space , 1997, Scale-Space.

[74]  S. Ullman,et al.  The interpretation of visual motion , 1977 .

[75]  Tony Lindeberg,et al.  On the Axiomatic Foundations of Linear Scale-Space , 1997, Gaussian Scale-Space Theory.

[76]  T. Lindeberg,et al.  Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[77]  Kazuo Kyuma,et al.  Computer vision for computer games , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[78]  ERIC A. SMITH,et al.  Automated Cloud Tracking Using Precisely Aligned Digital ATS Pictures , 1972, IEEE Transactions on Computers.

[79]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[80]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[81]  Luc Van Gool,et al.  An Extended Class of Scale-Invariant and Recursive Scale Space Filters , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[82]  Rudolf Mester,et al.  The Role of Total Least Squares in Motion Analysis , 1998, ECCV.

[83]  Yee-Hong Yang,et al.  First Sight: A Human Body Outline Labeling System , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[84]  S. Ahmad,et al.  A usable real-time 3D hand tracker , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[85]  Andrew P. Witkin,et al.  Uniqueness of the Gaussian Kernel for Scale-Space Filtering , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[86]  Anders Heyden,et al.  Structure and Motion from Points, Lines and Conics with Affine Cameras , 1998, ECCV.

[87]  Arthur C. Sanderson,et al.  Multiple Resolution Representation and Probabilistic Matching of 2-D Gray-Scale Shape , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  Patrick Bouthemy,et al.  Region-Based Tracking Using Affine Motion Models in Long Image Sequences , 1994 .

[89]  Larry S. Davis,et al.  Ghost: a human body part labeling system using silhouettes , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[90]  Michael J. Black,et al.  A Probabilistic Framework for Matching Temporal Trajectories: CONDENSATION-Based Recognition of Gestures and Expressions , 1998, ECCV.

[91]  Yuntao Cui,et al.  View-based hand segmentation and hand-sequence recognition with complex backgrounds , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[92]  Lars Bretzner,et al.  Use Your Hand as a 3-D Mouse, or, Relative Orientation from Extended Sequences of Sparse Point and Line Correspondences Using the Affine Trifocal Tensor , 1998, ECCV.

[93]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[94]  Michael Isard,et al.  A mixed-state condensation tracker with automatic model-switching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[95]  D Marr,et al.  Early processing of visual information. , 1976, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[96]  Takeo Kanade,et al.  A Paraperspective Factorization Method for Shape and Motion Recovery , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[97]  David J. Fleet,et al.  A framework for modeling appearance change in image sequences , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[98]  Han Wang,et al.  Gray Level Corner Detection , 1998, MVA.

[99]  Emanuele Trucco,et al.  Geometric Invariance in Computer Vision , 1995 .

[100]  Alan L. Yuille,et al.  Scaling Theorems for Zero Crossings , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Larry S. Davis,et al.  W/sup 4/: Who? When? Where? What? A real time system for detecting and tracking people , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[102]  Paul A. Beardsley,et al.  Navigation using Affine Structure from Motion , 1994, ECCV.

[103]  O. Faugeras Stratification of three-dimensional vision: projective, affine, and metric representations , 1995 .

[104]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[105]  Guillermo Sapiro,et al.  Robust anisotropic diffusion , 1998, IEEE Trans. Image Process..

[106]  Thomas S. Huang,et al.  Motion and Structure from Orthographic Projections , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[107]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[108]  Niklas Nordström,et al.  Biased anisotropic diffusion: a unified regularization and diffusion approach to edge detection , 1990, Image Vis. Comput..

[109]  Takeo Kanade,et al.  Affine structure from line correspondences with uncalibrated affine cameras , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[110]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[111]  Anders Heyden,et al.  Perception and Action Using Multilinear Forms , 1997, AFPAC.

[112]  Narendra Ahuja,et al.  Motion and Structure from Line Correspondences; Closed-Form Solution, Uniqueness, and Optimization , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[113]  Yoshiaki Shirai,et al.  Three-Dimensional Computer Vision , 1987, Symbolic Computation.

[114]  Tosiyasu L. Kunii,et al.  Model-based analysis of hand posture , 1995, IEEE Computer Graphics and Applications.

[115]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[116]  Jan J. Koenderink,et al.  Two-dimensional curvature operators , 1988 .

[117]  James L. Crowley,et al.  Finger Tracking as an Input Device for Augmented Reality , 1995 .

[118]  Paulo R. S. Mendonça,et al.  Analysis and Computation of an Affine Trifocal Tensor , 1998, BMVC.

[119]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[120]  Takeo Kanade,et al.  A unified factorization algorithm for points, line segments and planes with uncertainty models , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[121]  Kunihiro Chihara,et al.  Three-dimensional modeling of the human hand with motion constraints , 1999, Image Vis. Comput..

[122]  Thomas S. Huang,et al.  Theory of Reconstruction from Image Motion , 1992 .

[123]  Max A. Viergever,et al.  Scale and the differential structure of images , 1992, Image Vis. Comput..

[124]  Stephen M. Pizer,et al.  Object representation by cores: Identifying and representing primitive spatial regions , 1995, Vision Research.

[125]  Bart M. ter Haar Romeny,et al.  Geometry-Driven Diffusion in Computer Vision , 1994, Computational Imaging and Vision.

[126]  Atsuto Maki,et al.  Towards an active visual observer , 1995, Proceedings of IEEE International Conference on Computer Vision.

[127]  Elizabeth R. Stuck,et al.  Detecting Moving Objects Using the Rigidity Constraint , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[128]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[129]  Andrew Blake,et al.  Parallel Implementation of Lagrangian Dynamics for Real-time Snakes , 1991, BMVC.