Contributions to a 3-D robot vision system: grouping from sparse and incomplete data

The subject matter of this thesis is intermediate level vision, which is sometimes said to be up to now the most underdeveloped domain in computer vision. The task of intermediate level vision is to find sets of composed entities, which are structured combinations of the simple features in image data. This process is often called grouping: generate order from chaos by detecting structures of single physical objects or their parts. Structural grouping of information plays a fundamental role for both computer vision and human visual systems. New methods for grouping sparse and incomplete data for finding descriptions of shape are explored in this thesis. Their use for object recognition is illustrated. The major issues for developing new methods has been the need to achieve computational efficiency and robustness. The main motivation has been to provide methods for a model based, three-dimensional object recognition system. This visual recognition system was used for controlling a robot arm, which was capable of manipulating complex, three-dimensional, real-world objects close to real time. One part of this diesis focuses on adaptive setting of the poll size in Probabilistic Hough Transforms from sparse data. The Hough Transform is widely used in computer vision for grouping of data and for object recognition. A large number of operations is needed for computing the Hough Transform. In many applications however, only a small subset of sparse data is required for reliable object detection. It is experimentally demonstrated in this thesis, that the suggested methods for adaptive stopping rules call for polls with average size that is lower than the fixed poll size that would lead to the same error rate. Elegant stopping rules are described, which are based on the rank of the highest peaks detected in the accumulator array. Another part of this thesis introduces grouping processes for finding axial descriptions of symmetrical shapes from incomplete edge data. In many important applications, symmetrical two-dimensional projections of three-dimensional objects are exhibited. A particular axial description of shape is introduced, which consists of a set of linear segment pairs. An elaborate two-phase grouping process is presented, which results in complete descriptions of objects from incomplete data. It is illustrated, that the detected shape descriptions are useful for feature extraction, object recognition, shape description, and stereo correspondence.