Spatiotemporal segmentation and tracking of objects for visualization of videoconference image sequences

A procedure is described for the segmentation, content-based coding and visualization of videoconference image sequences. First, image sequence analysis is used to estimate the shape and motion parameters of the person facing the camera. The foreground object is segmented in a number of subobjects, in order to identify the facial region. For this purpose, we propose the novel procedure of K-Means with connectivity constraint algorithm as a general segmentation algorithm combining several types of information including intensity, motion and compactness. In this algorithm, the use of spatiotemporal regions is introduced since a number of frames is analyzed simultaneously and as a result the same region is present in consequent frames. Based on this information, a 3D ellipsoid is adapted to the person's face using an efficient and robust algorithm. The rigid 3D motion is estimated next using a least median of squares approach. Finally a VRML file is created containing all the above estimated information; this file may be viewed by using any VRML 2.0 compliant browser.

[1]  Roland T. Chin,et al.  Deformable Contours: Modeling and Extraction , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Michael Hötter,et al.  Object-oriented analysis-synthesis coding based on moving two-dimensional objects , 1990, Signal Process. Image Commun..

[3]  G. Matheron Random Sets and Integral Geometry , 1976 .

[4]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  King Ngi Ngan,et al.  Automatic segmentation of moving objects for video object plane generation , 1998, IEEE Trans. Circuits Syst. Video Technol..

[6]  Michael G. Strintzis,et al.  3D object articulation and motion estimation in model-based stereoscopic videoconference image sequence analysis and coding , 1999, Signal Process. Image Commun..

[7]  Serge Ayer,et al.  Sequential and competitive methods for estimation of multiple motions , 1995 .

[8]  Michael G. Strintzis,et al.  3-D model-based segmentation of videoconference image sequences , 1998, IEEE Trans. Circuits Syst. Video Technol..

[9]  Gilad Adiv,et al.  Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nikolaos Grammalidis,et al.  Object-based coding of stereo image sequences using joint 3-D motion/disparity compensation , 1997, IEEE Trans. Circuits Syst. Video Technol..

[11]  Jean Serra,et al.  Image Analysis and Mathematical Morphology , 1983 .

[12]  Jens-Rainer Ohm,et al.  An object-based system for stereoscopic viewpoint synthesis , 1997, IEEE Trans. Circuits Syst. Video Technol..

[13]  Demin Wang Unsupervised video segmentation based on watersheds and temporal tracking , 1998, IEEE Trans. Circuits Syst. Video Technol..

[14]  Touradj Ebrahimi,et al.  Video segmentation based on multiple features for interactive multimedia applications , 1998, IEEE Trans. Circuits Syst. Video Technol..

[15]  Konstantinos N. Plataniotis,et al.  Automatic location and tracking of the facial region in color video sequences , 1999, Signal Process. Image Commun..

[16]  Rudolf Mester,et al.  Detection and description of moving objects by stochastic modelling and analysis of complex scenes , 1996, Signal Process. Image Commun..

[17]  Scott T. Acton,et al.  Anisotropic diffusion pyramids for image segmentation , 1994, Proceedings of 1st International Conference on Image Processing.

[18]  Z SelimShokri,et al.  K-Means-Type Algorithms , 1984 .

[19]  King-Sun Fu,et al.  A survey on image segmentation , 1981, Pattern Recognit..

[20]  Nikolaos Grammalidis,et al.  Stereo image sequence coding based on three-dimensional motion estimation and compensation , 1995, Signal Process. Image Commun..

[21]  Michael G. Strintzis,et al.  3D Representation of Videoconference Image Sequences Using VRML 2.0 , 1998, ECMAST.

[22]  Leonardo Chiariglione MPEG and multimedia communications , 1997, IEEE Trans. Circuits Syst. Video Technol..

[23]  Jörn Ostermann,et al.  Object-oriented analysis-synthesis coding of moving images , 1989, Signal Process. Image Commun..

[24]  Michael G. Strintzis,et al.  3D model-based segmentation of videoconference image sequences , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[25]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Thomas Sikora,et al.  The MPEG-4 video standard verification model , 1997, IEEE Trans. Circuits Syst. Video Technol..

[27]  Andrew W. Fitzgibbon,et al.  Direct least squares fitting of ellipses , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[28]  Philippe Schroeter Unsupervised two-dimensional and three-dimensional image segmentation , 1996 .

[29]  Levent Onural,et al.  Image sequence analysis for emerging interactive multimedia services-the European COST 211 framework , 1998, IEEE Trans. Circuits Syst. Video Technol..

[30]  R. Koenen,et al.  MPEG-4 multimedia for our time , 1999 .

[31]  Michael G. Strintzis,et al.  Model-Based Joint Motion and Structure Estimation from Stereo Images , 1997, Comput. Vis. Image Underst..

[32]  Linda G. Shapiro,et al.  Image Segmentation Techniques , 1984, Other Conferences.

[33]  Michael G. Strintzis,et al.  Flexible 3D motion estimation and tracking for multiview image sequence coding , 1998, Signal Process. Image Commun..

[34]  Brian G. Schunck,et al.  A Two-Stage Algorithm for Discontinuity-Preserving Surface Reconstruction , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Alan L. Yuille,et al.  Determining The Optimal Weights In Multiple Objective Function Optimization , 1988, [1988 Proceedings] Second International Conference on Computer Vision.