Segmentation and tracking of multiple video objects

This paper describes a technique that produces a content-based representation of a video shot composed by a background (still) mosaic and one or more foreground moving objects. Segmentation of moving objects is based on ego-motion compensation and on background modelling using tools from robust statistics. Region matching is carried out by an algorithm that operates on the Mahalanobis distance between region descriptors in two subsequent frames and uses singular value decomposition to compute a set of correspondences satisfying both the principle of proximity and the principle of exclusion. The sequence is represented as a layered graph, and specific techniques are introduced to cope with crossing and occlusion. Examples of MPEG-4 (main profile) encoding are reported.

[1]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[2]  Fernando Pereira,et al.  MPEG-7 the generic multimedia content description standard, part 1 - Multimedia, IEEE , 2001 .

[3]  Paul L. Rosin Thresholding for change detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[4]  Harpreet S. Sawhney,et al.  Compact Representations of Videos Through Dominant and Multiple Motion Estimation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ingemar J. Cox,et al.  A review of statistical data association techniques for motion correspondence , 1993, International Journal of Computer Vision.

[6]  Naoya Ohta,et al.  Accuracy bounds and optimal computation of homography for image mosaicing applications , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Gérard G. Medioni,et al.  Detecting and tracking moving objects for video surveillance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[8]  Sangho Park,et al.  Segmentation and tracking of interacting human body parts under occlusion and shadowing , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  Graeme A. Jones,et al.  Segmentation of Global Motion using Temporal Probabilistic Classification , 1998, BMVC.

[11]  Fernando Pereira,et al.  MPEG-4: Context and objectives , 1997, Signal Process. Image Commun..

[12]  Mubarak Shah,et al.  A non-iterative greedy algorithm for multi-frame point correspondence , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Maurizio Pilu,et al.  A direct method for stereo correspondence based on singular value decomposition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Emanuele Trucco,et al.  Layered Representation of a Video Shot with Mosaicing , 2002, Pattern Analysis & Applications.

[15]  Andrea Fusiello,et al.  High resolution video mosaicing with global alignment , 2004, CVPR 2004.

[16]  Carlo Tomasi,et al.  Alpha estimation in natural images , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  P. Anandan,et al.  Efficient representations of video sequences and their applications , 1996, Signal Process. Image Commun..

[18]  R. Brunelli,et al.  A Survey on the Automatic Indexing of Video Data, , 1999, J. Vis. Commun. Image Represent..

[19]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[20]  H. C. Longuet-Higgins,et al.  An algorithm for associating the features of two images , 1991, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[21]  Guojun Lu,et al.  Segmentation of moving objects in image sequence: A review , 2001 .