Geometry and photometry in three-dimensional visual recognition

This thesis addresses the problem of visual recognition under two sources of variability: geometric and photometric. The geometric deals with the relation between 3D objects and their views under parallel, perspective, and central projection. The photometric deals with the relation between 3D matte objects and their images under changing illumination conditions. Taken together, an alignment-based method is presented for recognizing objects viewed from arbitrary viewing positions and illuminated by arbitrary settings of light sources. In the first part of the thesis we show that a relative non-metric structure invariant that holds under both parallel and central projection models can be defined relative to four points in space and, moreover, can be uniquely recovered from two views regardless of whether one or the other was created by means of parallel or central projection. As a result, we propose a method that is useful for purposes of recognition (via alignment) and structure from motion, and that has the following properties: (i) the transition between projection models is natural and transparent, (ii) camera calibration is not required, and (iii) structure is defined relative to the object and does not involve the center of projection. The second part of this thesis addresses the photometric aspect of recognition under changing illumination. First, we argue that image properties alone do not appear to be generally sufficient for dealing with the effects of changing illumination; we propose a model-based approach instead. Second, we observe that the process responsible for factoring out the illumination during the recognition process appears to require more than just contour information, but just slightly more. Taken together, we introduce a model-based alignment method that compensates for the effects of changing illumination by linearly combining model images of the object. The model images, each taken from a different illumination condition, can be converted onto novel images of the object regardless of whether the image is represented by grey-values, sign-bits, or other forms of reduced representations. The third part of this thesis addresses the problem of achieving full correspondence between model views and puts together the geometric and photometric components into a single recognition system. The method for achieving correspondence is based on combining affine or projective geometry and optical flow techniques into a single working framework. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)