An Analysis of Lowe's Model-based Vision System

Like several other model-based vision systems, the initial phase of Lowe's method involves applying an edge detector to the input image and extracting line segments from the result. This gives a set of k observed segments. These segments have to be matched to the n segments in the model. This matching process can be viewed as a search through an "interpretation tree" containing (n + l) nodes . For any reasonable values of k and n, this tree contains a very large number of nodes. A sophisticated search strategy is needed to complete this search in a reasonable time. The search strategy developed by Lowe starts by grouping the observed line segments into significant multisegment structures called perceptual groups. These "observed" groups are then matched to groups known to exist in the model. By initially matching groups instead of individual segments, the interpretation tree is pruned to a huge extent. Furthermore, since perceptual groups containing three or more segments are used to initialize the search, at each stage of the search it is possible to solve for the object's pose parameters (i.e. location and orientation). Using this estimate of pose, the position of each model segment's image can be predicted. Subbranches of the interpretation tree are only considered if the observed segments are in close proximity to the predicted image of their assigned model segment. These two factors lead to a very efficient search strategy.