Segmentation of three-dimensional scenes using multi-modal interaction between machine vision and programmable, mechanical scene manipulation

The main idea in this dissertation is that one cannot discern the part-whole relationship of three-dimensional objects in a passive mode without a great deal of a priori information. Perceptual activity is exploratory, probing and searching. The main issue in active perception is control of the exploratory movements and the interaction between sensors and actions. Physical scene segmentation is the first step in active perception. The task of perception is greatly simplified if one has to deal with only one object at a time. The thesis adapts the non-deterministic Turing machine model and develops strategies to control the interaction between sensors and actions for physical scene segmentation and perception. Scene segmentation is formulated in graph theoretic terms as a graph generation/decomposition problem. The isomorphism between the manipulation actions and graph decomposition operations is defined. The sensors generate the directed graphs representing the spatial relations among connected surface regions and the manipulator decomposes these graphs under sensor supervision. Assuming a finite number of sensors and actions and a goal state, that is reachable and measurable with the available sensors, the control strategies converge. Methods of perception via iteration and interaction of vision, manipulation, force/torque and other sensory data are presented. No simulations were used in this thesis. Instead, an experimental system has been developed and integrated. This prototype consists of a vision system, a robot, an instrumented gripper equipped with force/torque and other sensors, several tools for manipulation, and a central computer/controller. The model has been tested in the USPS domain by applying it to the problem of sorting irregular parcels (IPPs). Many experiments have proved the validity of the underlying theory. Furthermore, error recovery and convergence of the model has been experimentally verified in a real, noisy, and dynamic environment.