Bilayer Segmentation of Live Video in Uncontrolled Environments for Background Substitution: An Overview and Main Challenges

Bilayer segmentation of live video in uncontrolled environments is an essential task for home applications in which the original background of the scene must be replaced, as in video chats or traditional videoconference. The main challenge in such conditions is overcome all difficulties in problem-situations (e.g., illumination change, distract events such as element moving in the background and camera shake) that may occur while the video is being captured. This paper presents a survey of segmentation methods for background substitution applications, describes the main concepts and identifies events that may cause errors. Our analysis shows that although robust methods rely on specific devices (multiple cameras or sensors to generate depth maps) which aid the process. In order to achieve the same results using conventional devices (monocular video cameras), most current research relies on energy minimization frameworks, in which temporal and spacial information are probabilistically combined with those of color and contrast.

[1]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Andrew Blake,et al.  Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  M. Hirose,et al.  Usage of Video Avatar Technology for Immersive Communication , 2003 .

[6]  Sanjay Misra,et al.  Computational Science and Its Applications – ICCSA 2012 , 2012, Lecture Notes in Computer Science.

[7]  Michael Werman,et al.  Fusing Time-of-Flight Depth and Color for Real-Time Segmentation and Tracking , 2009, Dyn3D.

[8]  Irfan A. Essa,et al.  Tree-based Classifiers for Bilayer Video Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ramin Samadani,et al.  Bilayer video segmentation for videoconferencing applications , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[12]  Zhenjiang Miao,et al.  Background Subtraction Using Running Gaussian Average and Frame Difference , 2007, ICEC.

[13]  Harry Shum,et al.  Background Cut , 2006, ECCV.

[14]  Romero Tori,et al.  Mutual occlusion between real and virtual elements in Augmented Reality based on fiducial markers , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[15]  M. Ibrahim Sezan,et al.  Video background replacement without a blue screen , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[16]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[18]  A. Criminisi,et al.  Bilayer Segmentation of Live Video , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[20]  Weiwei Zhang,et al.  Face-tracking as an augmented input in video games: enhancing presence, role-playing and control , 2006, CHI.

[21]  Andrew Blake,et al.  Probabilistic Fusion of Stereo with Color and Contrast for Bi-Layer Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ricardo Nakamura Vídeo-Avatar com detecção de colisão para realidade aumentada e jogos. , 2008 .

[23]  Qiong Wu,et al.  Robust Real-Time Bi-Layer Video Segmentation Using Infrared Video , 2008, 2008 Canadian Conference on Computer and Robot Vision.

[24]  Woonhyun Nam,et al.  Motion-based background modeling for foreground segmentation , 2006, VSSN '06.

[25]  Stan Sclaroff,et al.  Foreground object segmentation from binocular stereo video , 2005, SPIE Optics East.

[26]  Bohyung Han,et al.  SEQUENTIAL KERNEL DENSITY APPROXIMATION THROUGH MODE PROPAGATION: APPLICATIONS TO BACKGROUND MODELING , 2004 .

[27]  Tom Duff,et al.  Compositing digital images , 1984, SIGGRAPH.

[28]  Reinhard Koch,et al.  ToF-sensors: New dimensions for realism and interactivity , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Massimo Piccardi,et al.  Background subtraction techniques: a review , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[30]  Chris Harrison,et al.  Pseudo-3D Video Conferencing with a Generic Webcam , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[31]  Christian Breiteneder,et al.  Virtual Studios: An Overview , 1998, IEEE Multim..

[32]  Paul Wintz,et al.  Digital image processing (2nd ed.) , 1987 .

[33]  Sang Chul Ahn,et al.  Teleconference System with a Shared Working Space and Face Mouse Interaction , 2004, PCM.

[34]  Takeo Kanade,et al.  Stereo by Intra- and Inter-Scanline Search Using Dynamic Programming , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ingemar J. Cox,et al.  A Maximum Likelihood Stereo Algorithm , 1996, Comput. Vis. Image Underst..

[36]  Michael F. Cohen,et al.  Image and Video Matting: A Survey , 2007, Found. Trends Comput. Graph. Vis..

[37]  Eyal Ofek,et al.  Depth keying , 2003, IS&T/SPIE Electronic Imaging.

[38]  William A. Barrett,et al.  Toboggan-based intelligent scissors with a four-parameter edge model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[39]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[40]  E. O. T. Salles,et al.  A New Change Detection Algorithm for Visual Surveillance System , 2012, IEEE Latin America Transactions.

[41]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[42]  Jeff Foster The Green Screen Handbook , 2014 .

[43]  Romero Tori,et al.  Comprehensive Model and Image-Based Recognition of Hand Gestures for Interaction in 3D Environments , 2011, Int. J. Virtual Real..

[44]  Rita Cucchiara,et al.  Detecting Moving Objects, Ghosts, and Shadows in Video Streams , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Stuart J. Russell,et al.  Image Segmentation in Video Sequences: A Probabilistic Approach , 1997, UAI.

[46]  Irfan A. Essa,et al.  Bilayer Segmentation of Webcam Videos Using Tree-Based Classifiers , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[48]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[49]  Andrew Blake,et al.  Bi-layer segmentation of binocular stereo video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[50]  Ming Li,et al.  Towards real-time novel view synthesis using visual hulls , 2005 .

[51]  Yung-Yu Chuang,et al.  New models and methods for matting and compositing , 2004 .

[52]  Chun Chen,et al.  A new foreground extraction scheme for video streams , 2001, MULTIMEDIA '01.

[53]  Alan L. Yuille,et al.  Occlusions and binocular stereo , 1992, International Journal of Computer Vision.

[54]  Shmuel Peleg,et al.  A Three-Frame Algorithm for Estimating Two-Component Image Motion , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  S. Burak Gokturk,et al.  A Time-Of-Flight Depth Sensor - System Description, Issues and Solutions , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[56]  Luca Lombardi,et al.  Evaluation of a Foreground Segmentation Algorithm for 3D Camera Sensors , 2009, ICIAP.