A minimal representation framework for multisensor fusion and model selection

As robotic applications become more complex, the selection of model, parameters, and data subsamples is more difficult to determine a priori, and therefore the availability of a consistent framework across a wide variety of problem domains is important. We present a general framework for multisensor fusion and model selection, which uses representation size (description length) as a universal sensor independent yardstick to choose (a) the model class and number of parameters from a library of environment models, (b) the model parameter resolution and data scaling based on the sensor resolution and accuracy, (c) the subset of observed data features which are modeled (and are therefore used in the environment model parameter estimation) based on sensor precision, and (d) the correspondence which maps data features to model features. The minimal representation size criterion may be used for the environment model parameter estimation itself, or alternatively a more traditional statistical estimation method may be used. The search for the best interpretation may be conducted using (1) polynomial time algorithms using constraining data feature sets to instantiate environment models, or (2) evolution programs which use principles of natural selection to reproduce a population of interpretations. The framework is applied to object recognition and pose estimation in two and three dimensions, using vision, touch, and grasp sensors. In the laboratory experiments, tactile data obtained from the finger-tips of a robot hand, while it is holding an object in front of a camera, is fused with the vision data from the camera, to determine the object identity, pose, and the touch and vision data correspondences. The touch data is incomplete due to required hand configurations, while nearly half of the vision data is spurious due to the presence of the hand in the image. Using either sensor alone results in ambiguous or incorrect interpretations, and multisensor fusion is necessary to consistently find the correct interpretation. The experiments use a differential evolution program to search for the best interpretation, and demonstrate that the framework leads to a practical method for solving multisensor fusion and model selection problems.