Tracking based on local motion estimation of spatio-temporally weighted salient points

The extraction of a video object contour, called "rotoscoping" in cinematographic post-production, is usually performed manually and frame by frame. Semi-automatic algorithms have been proposed to reduce the load of this task. However, they classically use region information and are usually based on a notion of homogeneity of the object. This homogeneity description might be difficult to establish and, consequently, the tracking may be not precise enough. The proposed method relies on the analysis of some temporal trajectories of salient points, or keypoints, called tracks. The main contribution of this paper is the local estimation, both spatially and temporally, of the contour motion from these tracks. The proposed method seems accurate, robust to outliers, and allows local deformation. Moreover, it can deal with partial occlusions.