Appearance modeling under geometric context for object tracking and recognition Dissertation Proposal

In computer vision literature, most object recognition algorithms are based on shape models or the appearance inside a rectangular box. However, only recently, appearance modeling inside arbitrary contours has been undertaken. It becomes particularly important when we need to fingerprint the appearance of humans with articulated motion or vehicles under arbitrary poses. Essentially we need a framework that models the appearance under certain shape context or combines appearance and shape information. In this paper, we propose a unifying framework based on a general definition of Geometric Transform (GeT) for modeling appearance under geometric context. Geometric context refers to the geometric prior information. It can be based on a model, inference from the contour, or prior knowledge of the motion etc. The GeT incorporates the geometric context by applying designed functionals over certain geometric sets of an image. We show that linear and non-linear image transformations, Radon Transform, and Trace Transform are special cases of GeT. We also propose some innovative ways of generating the geometric sets, such as from the contour boundary, or from skeletons of the shape, rather than simply from some feature points as in Active Appearance Model (AAM) [3]. In the case when we only use sets of straight lines as in Radon Transform, we propose a multi-resolution representation that combines both shape and appearance information. We test our methods by classifying the appearance of pedestrians according to their appearance. An example of registering a 3D vehicle model for 3D tracking is also illustrated. In the application of fingerprinting vehicles in the video, we often need to extract a 3D model of the vehicle before modeling its appearance across different views. Therefore, we propose a factorization approach for structure from planar motion that can be used to reconstruct 3D model of vehicle in surveillance videos. Compared with [20] for general motion, our work has three major differences: a different measurement matrix specialized for planar motion is formed. The measurement matrix has a rank of at most 3, instead of 4; the measurement matrix needs similar scalings, but estimation of fundamental matrices or epipoles is not needed; we have an Euclidean reconstruction instead of a projective reconstruction. The camera is not required to be calibrated. A simple semi-automatic calibration method using vanishing points and lines is sufficient. Experimental results show that the algorithm is accurate and fairly robust to noise and inaccurate calibration.

[1]  Sudeep Sarkar,et al.  The gait identification challenge problem: data sets and baseline algorithm , 2002, Object recognition supported by user interaction for service robots.

[2]  Rama Chellappa,et al.  Characterization of Human Faces under Illumination Variations Using Rank, Integrability, and Symmetry Constraints , 2004, ECCV.

[3]  Geoffrey D. Sullivan,et al.  A Simple, Intuitive Camera Calibration Tool for Natural Images , 1994, BMVC.

[4]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.

[5]  John Oliensis,et al.  A Critique of Structure-from-Motion Algorithms , 2000, Comput. Vis. Image Underst..

[6]  Shigang Li,et al.  Determining of camera rotation from vanishing points of lines on horizontal planes , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[7]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[8]  George Wolberg,et al.  Digital image warping , 1990 .

[9]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[10]  Avinash C. Kak,et al.  Principles of computerized tomographic imaging , 2001, Classics in applied mathematics.

[11]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[13]  John Oliensis Structure from linear or planar motions , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Wen-Hsiang Tsai,et al.  Camera Calibration by Vanishing Lines for 3-D Computer Vision , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Tieniu Tan,et al.  3D structure and motion estimation from 2D image sequences , 1993, Image Vis. Comput..

[16]  A. Murat Tekalp,et al.  Error Characterization of the Factorization Method , 2001, Comput. Vis. Image Underst..

[17]  L. Ehrenpreis The Universality of the Radon Transform , 2003 .

[18]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[19]  René Vidal,et al.  Structure from Planar Motions with Small Baselines , 2002, ECCV.

[20]  Alexander Kadyrov,et al.  Affine invariant features from the trace transform , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Anil K. Jain,et al.  Feature extraction methods for character recognition-A survey , 1996, Pattern Recognit..

[22]  Peter F. Sturm,et al.  A Factorization Based Algorithm for Multi-Image Projective Structure and Motion , 1996, ECCV.

[23]  Harry Shum,et al.  Constrained planar motion analysis by decomposition , 2004, Image Vis. Comput..

[24]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[25]  Daniel D. Morris,et al.  Factorization methods for structure from motion , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.