Integration of bottom-up and top-down cues in Bayesian network for object detection

Automatic detection of objects in cluttered images involves a lot of uncertainty, and has been a great challenge in computer vision. However, humans can easily find their interested objects via the mechanisms of visual selective attention. Inspired by this, a visual attention model which integrates bottom-up and top-down cues in the Bayesian network is proposed. In this model, the bottom-up color and shape cues are combined with the top-down color and shape priors, and all these cues are related with their locations, simulating the convergence of bottom-up and top-down cues in visual area V4 of human visual system. Then, a Bayesian network is constructed according to these cues, and each cue is represented as a node in the network. Finally, a saliency map about localizing the target objects is created through the inference in the Bayesian network. The uncertainty in object detection is reduced by the inference greatly. The experiments show that this model can detect interested objects in images with complex backgrounds, even if the objects have different sizes, colors or shapes, and appear in different places of an image. In comparison with the results of famous Itti's visual attention model, the advantage of our model is that it can obtain the contours of objects, which is very helpful for further process of object recognition.