Statistical image sequence segmentation using multidimensional attributes

A novel interactive approach is presented for finding multiple objects in an image sequence. The segmentation technique requires only a small number of training samples from a user in order to model the multidimensional probability distribution function (PDF) of the image characteristics at each region. The user-supplied training samples are tracked over time, and the PDF of each region can be updated at every frame. All the unlabeled data points are classified to the most likely region, based on the distance between the PDF of each region, and the multidimensional feature value. A certainty of that classification is computed and stored at each sample for additional post-processing and analysis of the segmentation. The usefulness of such a system has immediate applications in the fields of image processing and computer vision. Image attributes such as color, texture, motion, and position are used to characterize the different regions, and constitute the basis of our multidimensional feature vector at every sample location in the image sequence. A detailed explanation of a PDF estimation technique known as expectation maximization (EM) explains the basis of how the PDF of each feature is calculated. Improvements to the basic EM algorithm including deterministic annealing are also tested. Varying the feature calculation methods, feature combinations, PDF estimation techniques, and tracking of user-supplied training for several sequences establish the basis of our experiments and results. The experiments presented in this thesis show how the segmentation result is usually robust with respect to the method of calculating the features, and the result is typically better when using all features instead of any one feature exclusively. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  R. Gray,et al.  Combining Image Compression and Classification Using Vector Quantization , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Kris Popat,et al.  Exaggerated consensus in lossless image compression , 1994, Proceedings of 1st International Conference on Image Processing.

[3]  O. J. Morris,et al.  Segmented-image coding: Performance comparison with the discrete cosine transform , 1988 .

[4]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[5]  Petros Maragos,et al.  Affine models for motion and shape recovery , 1992, Other Conferences.

[6]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Other Conferences.

[7]  Andrew Lippman,et al.  Spatio-temporal segmentation based on motion and static segmentation , 1995, Proceedings., International Conference on Image Processing.

[8]  Henri Nicolas,et al.  Region-based motion estimation using deterministic relaxation schemes for image sequence coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[10]  Terry Caelli,et al.  Computation of Surface Geometry and Segmentation Using Covariance Techniques , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Hanspeter Bieri,et al.  A ternary tree representation of generalized digital images , 1993 .

[12]  Fang Liu,et al.  A new Wold ordering for image similarity , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Bangalore S. Manjunath,et al.  Tools for texture/color based search of images , 1997 .

[16]  Shmuel Peleg,et al.  A Three-Frame Algorithm for Estimating Two-Component Image Motion , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Narendra Ahuja,et al.  Motion and Structure from Line Correspondences; Closed-Form Solution, Uniqueness, and Optimization , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[19]  Lindsay W. MacDonald,et al.  Computer Generated Colour: A Practical Guide to Presentation and Display , 1993 .

[20]  Stan Sclaroff,et al.  Object recognition and categorization using modal matching , 1994, Proceedings of 1994 IEEE 2nd CAD-Based Vision Workshop.

[21]  Edward H. Adelson,et al.  Layered representations for image coding , 1991 .

[22]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[23]  Edward H. Adelson,et al.  Perceptually Organized Em: a Framework for Motion Segmentation That Combines Information about Form and Motion , 1995 .

[24]  Jake K. Aggarwal,et al.  The Integration of Image Segmentation Maps using Region and Edge Information , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Anil K. Jain,et al.  Texture classification and segmentation using multiresolution simultaneous autoregressive models , 1992, Pattern Recognit..

[26]  Roland Wilson,et al.  A generalized wavelet transform for Fourier analysis: The multiresolution Fourier transform and its application to image and audio signal analysis , 1992, IEEE Trans. Inf. Theory.

[27]  Alvin W. Drake,et al.  Fundamentals of Applied Probability Theory , 1967 .

[28]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Michael T. Orchard,et al.  Overlapped block motion compensation: an estimation-theoretic approach , 1994, IEEE Trans. Image Process..

[30]  Gilad Adiv,et al.  Inherent Ambiguities in Recovering 3-D Motion and Structure from a Noisy Flow Field , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Edward H. Adelson,et al.  Layered representation for motion analysis , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Hon-Son Don,et al.  Segmentation of Bilevel Images Using Mathematical Morphology , 1992, Int. J. Pattern Recognit. Artif. Intell..

[33]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[34]  V. Michael Bove,et al.  Open architecture television for motion-compensated coding , 1992, Other Conferences.

[35]  Mark S. Nixon,et al.  Statistical geometrical features for texture classification , 1995, Pattern Recognit..

[36]  C. W. Therrien,et al.  Decision, Estimation and Classification: An Introduction to Pattern Recognition and Related Topics , 1989 .

[37]  Boon-Lock Yeo,et al.  Extracting story units from long programs for video browsing and navigation , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[38]  V. Michael Bove,et al.  Segmentation of an image sequence using multi-dimensional image attributes , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[39]  F. Girosi,et al.  Some Extensions of the K-Means Algorithm for Image Segmentation and Pattern Classification , 1993 .

[40]  Narendra Ahuja,et al.  Optimal motion and structure estimation , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Sebastian Toelg,et al.  Towards an Example-Based Image Compression Architecture for Video-Conferencing , 1994 .

[42]  A. Murat Tekalp,et al.  Fusion of color and edge information for improved segmentation and edge linking , 1997, Image Vis. Comput..

[43]  Marcel Breeuwer Motion-adaptive subband coding of interlaced video , 1992, Other Conferences.

[44]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[45]  Alex Pentland,et al.  Modal Matching for Correspondence and Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Nick Kingsbury,et al.  Video compression using lapped transforms for motion estimation/compensation and coding , 1992, Other Conferences.

[47]  Joachim M. Buhmann,et al.  Multidimensional Scaling by Deterministic Annealing , 1997, EMMCVPR.

[48]  Edward H. Adelson,et al.  Spatio-temporal segmentation of video data , 1994, Electronic Imaging.

[49]  Touradj Ebrahimi,et al.  Image sequence coding using a three-dimensional wavelet packet and adaptive selection , 1992, Other Conferences.

[50]  Philippe Andrey,et al.  Unsupervised image segmentation using a distributed genetic algorithm , 1994, Pattern Recognit..

[51]  Patrick Bouthemy,et al.  Multimodal Estimation of Discontinuous Optical Flow using Markov Random Fields , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Rosalind W. Picard Structured patterns from random fields , 1992, [1992] Conference Record of the Twenty-Sixth Asilomar Conference on Signals, Systems & Computers.

[53]  Jean-Michel Jolion,et al.  The adaptive pyramid: A framework for 2D image analysis , 1991, CVGIP Image Underst..

[54]  Mohammad Ghanbari,et al.  Generalized block-matching motion estimation , 1992, Other Conferences.

[55]  Bernhard Wegmann,et al.  Efficient image sequence coding by vector quantization of spatiotemporal bandpass outputs , 1992, Other Conferences.

[56]  V. Michael Bove,et al.  Multilevel Scripting for Responsive Multimedia , 1997, IEEE Multim..

[57]  Kenneth Rose,et al.  Deterministically annealed mixture of experts models for statistical regression , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[58]  Hiroshi Harashima,et al.  Iterative motion estimation method using triangular patches for motion compensation , 1991, Other Conferences.

[59]  Ari Veijanen,et al.  Unsupervised image segmentation using an unlabeled region process , 1994, Pattern Recognit..

[60]  James S. Duncan,et al.  Boundary Finding with Parametrically Deformable Models , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Alan L. Yuille,et al.  Region Competition: Unifying Snakes, Region Growing, and Bayes/MDL for Multiband Image Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  V. Michael Bove,et al.  Segmentation of frames in a video sequence using motion and other attributes , 1995, Electronic Imaging.

[63]  Patrick Bouthemy,et al.  Region-Based Tracking Using Affine Motion Models in Long Image Sequences , 1994 .

[64]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[65]  J. Bergen,et al.  Computational Modeling of Visual Texture Segregation , 1991 .

[66]  Frederic Dufaux,et al.  Multigrid block-matching motion estimation with an adaptive local mesh refinement , 1992, Other Conferences.

[67]  Josef Bigün,et al.  Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement , 1995, Pattern Recognit..

[68]  R. Nevatia,et al.  Perceptual Organization for Scene Segmentation and Description , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  최재각,et al.  결합 유사성 척도를 이용한 시공간 영상 분할 ( Spatio-Temporal Video Segmentation Using a Joint Similarity Measure ) , 1997 .

[70]  Fang Liu,et al.  Periodicity, Directionality, and Randomness: Wold Features for Image Modeling and Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Yoshiaki Shirai,et al.  Segmentation and 2D motion estimation by region fragments , 1993, 1993 (4th) International Conference on Computer Vision.

[72]  John Wang,et al.  Applying mid-level vision techniques for video data compression and manipulation , 1994, Electronic Imaging.

[73]  Kenneth Rose,et al.  A global optimization technique for statistical classifier design , 1996, IEEE Trans. Signal Process..

[74]  Richard Szeliski,et al.  Recovering 3D Shape and Motion from Image Streams Using Nonlinear Least Squares , 1994, J. Vis. Commun. Image Represent..

[75]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..