Detecting, localizing and grouping repeated scene elements from an image

This paper presents an algorithm for detecting, localizing and grouping instances of repeated scene elements. The grouping is represented by a graph where nodes correspond to individual elements and arcs join spatially neighboring elements. Associated with each arc is an affine map that best transforms the image patch at one location to the other. The approach we propose consists of 4 steps: (1) detecting “interesting” elements in the image; (2) matching elements with their neighbors and estimating the affine transform between them; (3) growing the element to form a more distinctive unit; and (4) grouping the elements. The idea is analogous to tracking in dynamic imagery. In our context, we “track” an element to spatially neighboring locations in one image, while in temporal tracking, one would perform the search in neighboring image frames.