Incorporating camera metadata for attended region detection and consumer photo classification

Photos taken by human beings significantly differ from the pictures that are taken by a surveillance camera or a vision sensor on a robot, e.g., human beings may intentionally capture photos to express his/her feeling or record a memorial scene. Such a creative photo capture process is accomplished by adjusting two factors: (1) the parameters setting of a camera; and (2) the position between the camera and the interesting objects or scenes. To enable automatic understanding and interpretation of the semantics of photos, it is very important to take all these factors into account. Unfortunately, most existing algorithms for image understanding focus on only the content of the images while completely ignoring these two important factors. In this paper, we have developed a new algorithm to calculate what the interestingness of the photographer is and what the core content of a photo is. The gained information (i.e., attended regions and attention of the photographer) is further used to support more effective photo classification and retrieval. Our experiments on 70,000+ photos taken by 200+ different models of cameras have obtained very positive results.