Layered representations for vision and video

Human vision, machine vision, and image coding, each demand representations that are useful and efficient. The best-established techniques today are based on low-level processing. Future systems for image analysis and image coding will increasingly use image representations that involve such concepts as surfaces, lighting, transparency, etc. These representations fall in the domain of "mid-level" vision, and there is accumulating evidence of their importance in human vision. By representing images with these more sophisticated vocabularies we can increase the flexibility and efficiency of our vision and image coding systems. We are developing systems that decompose image sequences into overlapping layers, rather like the "cels" used by a traditional animator. These layers are ordered in depth, sliding over one another and being combined according to the rules of transparency and occlusion. Using the layered representation we can achieve greatly improved motion analysis and image segmentation. By applying layers to image coding we can achieve data compression far better than MPEG, and achieve frame-rate independence as a side benefit. Moreover, the image sequence is decomposed in a meaningful way, which allows flexible image editing and access.

[1]  F Metelli,et al.  The perception of transparency. , 1974, Scientific American.

[2]  David Marr,et al.  Representing Visual Information , 1977 .

[3]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[4]  Jörn Ostermann,et al.  Object-oriented analysis-synthesis coding of moving images , 1989, Signal Process. Image Commun..

[5]  Lance R. Williams Perceptual organization of occluding contours , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[6]  David Mumford,et al.  The 2.1-D sketch , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[7]  Edward H. Adelson,et al.  Ordinal characteristics of transparency. , 1990 .

[8]  A. Pentland,et al.  Robust estimation of a multi-layered motion representation , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[9]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[10]  Michal Irani,et al.  Detecting and Tracking Multiple Moving Objects Using Temporal Integration , 1992, ECCV.

[11]  Edward H. Adelson,et al.  Layered representation for motion analysis , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Edward H. Adelson,et al.  Layered representation for image sequence coding , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[14]  Peter N. Belhumeur,et al.  Global priors for binocular stereopsis , 1994, Proceedings of 1st International Conference on Image Processing.

[15]  Ujjaval Yogesh Desai,et al.  Coding of segmented image sequences , 1994 .

[16]  Harpreet S. Sawhney,et al.  3D geometry from planar parallax , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[17]  E. Adelson,et al.  Analyzing gait with spatiotemporal surfaces , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[18]  Edward H. Adelson,et al.  Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[19]  A. Jepson,et al.  Estimating multiple independent motions in segmented images using parametric models with local deformations , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[20]  E. H. Adelson,et al.  Motion estimation and segmentation using a recurrent mixture of experts architecture , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.