An unsupervised multi-resolution object extraction algorithm using video-cube

We propose a fast video object segmentation method that detects object boundaries accurately, and does not require any user assistance. Video streams are considered as 3D data, called video-cubes, to take advantage of 3D signal processing techniques. After a video sequence is filtered, marker nodes are selected from the color gradient. A volume around each marker is grown by using color/texture distance criteria. Then volumes that have similar characteristics are merged. Self-descriptors for each volume, mutual descriptors for each pair of volumes are computed. These descriptors capture motion and spatial information of volumes. In the clustering stage, volumes are classified into objects in a fine-to-coarse hierarchy. While applying and relaxing descriptor based adaptive, similarity scores are estimated for each possible pair-wise combination of volumes. The pair that gives the maximum score is clustered iteratively. Finally, an object-based multi-resolution representation tree is assembled.

[1]  Josef Bigün,et al.  Spatio-Temporal Robust Motion Estimation and Segmentation , 1995, CAIP.

[2]  M. Kunt,et al.  Second-generation image-coding techniques , 1985, Proceedings of the IEEE.

[3]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[4]  Murat Kunt,et al.  Second-generation image coding , 2000 .

[5]  J.K. Aggarwal,et al.  Correspondence processes in dynamic scene analysis , 1981, Proceedings of the IEEE.