A computational study of rigid motion perception (artificial intelligence, computer vision)

The interpretation of visual motion is investigated. The task of motion perception is divided into two major subtasks: (i) estimation of two dimensional retinal motion, and (ii) computation of parameters of rigid motion from retinal motion. Retinal motion estimation is performed using a point matching algorithm based on local similarity of matches and a global clustering strategy. The clustering technique unifies the notion of matching and motion segmentation and provides an insight into the complexity of the matching and segmentation process. The constraints governing the computation of the rigid motion parameters from retinal motion are investigated. The emphasis is on determining the possible ambiguity of interpretation and how to remove them. This theoretical analysis forms the basis of a set of algorithms for computing structure and three dimensional motion parameters from retinal displacements. The algorithms are experimentally evaluated. The main difficulties facing the computation are seen to be nonlinearity and a high dimensional search space of solutions. To alleviate these difficulties an active tracking method is proposed. This is a closed loop system for evaluating the motion parameters. It is shown that under such a regime it is possible to obtain closed form solutions for the motion parameters. This leads to a robust cooperative algorithm for motion perception requiring minimal amount of retinal motion matching. The central theme for this research has been the evaluation of a hierarchical model for visual motion perception. To this end, the investigations revolved around three primary issues: (a) retinal motion computation from intensity images; (b) the conditions under which three dimensional motion may be computed from retinal motion, and the efficacy of algorithms that perform such computation; c) the active vision or closed loop approach to visual motion interpretation and what it buys us. This thesis records fundamental contributions pertaining to the above questions.