Background foreground segmentation with RGB-D Kinect data: An efficient combination of classifiers

HighlightsWe use Kinect RGB-D data for foreground/background segmentation.Combination of Classifiers is used to improve segmentation performance.Generation of hand-labeled public available RGB-D benchmark dataset. Low cost RGB-D cameras such as the Microsoft's Kinect or the Asus's Xtion Pro are completely changing the computer vision world, as they are being successfully used in several applications and research areas. Depth data are particularly attractive and suitable for applications based on moving objects detection through foreground/background segmentation approaches; the RGB-D applications proposed in literature employ, in general, state of the art foreground/background segmentation techniques based on the depth information without taking into account the color information. The novel approach that we propose is based on a combination of classifiers that allows improving background subtraction accuracy with respect to state of the art algorithms by jointly considering color and depth data. In particular, the combination of classifiers is based on a weighted average that allows to adaptively modifying the support of each classifier in the ensemble by considering foreground detections in the previous frames and the depth and color edges. In this way, it is possible to reduce false detections due to critical issues that can not be tackled by the individual classifiers such as: shadows and illumination changes, color and depth camouflage, moved background objects and noisy depth measurements. Moreover, we propose, for the best of the author's knowledge, the first publicly available RGB-D benchmark dataset with hand-labeled ground truth of several challenging scenarios to test background/foreground segmentation algorithms.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[4]  José Luis Landabaso Díaz A unified framework for consistent 2d/3d foreground object detection , 2008 .

[5]  Thierry Bouwmans,et al.  Recent Advanced Statistical Background Modeling for Foreground Detection - A Systematic Survey , 2011 .

[6]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[7]  Yael Edan,et al.  Vision-based hand-gesture applications , 2011, Commun. ACM.

[8]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[9]  Montse Pardàs,et al.  A Unified Framework for Consistent 2-D/3-D Foreground Object Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[11]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[12]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[13]  Trevor Darrell,et al.  Background estimation and removal based on range and color , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[14]  Vittorio Murino,et al.  Background Subtraction for Automated Multisensor Surveillance: A Comprehensive Review , 2010, EURASIP J. Adv. Signal Process..

[15]  Gerhard Rigoll,et al.  Depth gradient based segmentation of overlapping foreground objects in range images , 2010, 2010 13th International Conference on Information Fusion.

[16]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[17]  Sander Oude Elberink,et al.  Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications , 2012, Sensors.

[18]  Sudeep Sarkar,et al.  Background subtraction in varying illuminations using an ensemble based on an enlarged feature set , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[20]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Fatih Murat Porikli,et al.  Changedetection.net: A new change detection benchmark dataset , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Luigi di Stefano,et al.  A fast area-based stereo matching algorithm , 2004, Image Vis. Comput..

[23]  Ferdinand van der Heijden,et al.  Efficient adaptive density estimation per image pixel for the task of background subtraction , 2006, Pattern Recognit. Lett..

[24]  Lucia Maddalena,et al.  A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications , 2008, IEEE Transactions on Image Processing.

[25]  Thierry Bouwmans,et al.  Background Modeling using Mixture of Gaussians for Foreground Detection - A Survey , 2008 .

[26]  Luis Salgado,et al.  Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.

[27]  Marc Van Droogenbroeck,et al.  Combining Color, Depth, and Motion for Video Segmentation , 2009, ICVS.

[28]  Yaser Sheikh,et al.  Bayesian modeling of dynamic scenes for object detection , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  F. J. Richards A Flexible Growth Function for Empirical Use , 1959 .

[30]  Marco Roccetti,et al.  Playing into the wild: A gesture-based interface for gaming in public spaces , 2012, J. Vis. Commun. Image Represent..

[31]  Karthikeyan Umapathy,et al.  Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking , 2010, EURASIP J. Adv. Signal Process..

[32]  A. Frick,et al.  Generation of 3D-TV LDV-content with Time-Of-Flight Camera , 2009, 2009 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[33]  Marjorie Skubic,et al.  Evaluation of an inexpensive depth camera for in-home gait assessment , 2011, J. Ambient Intell. Smart Environ..

[34]  Luis Salgado,et al.  Accurate depth-color scene modeling for 3D contents generation with low cost depth cameras , 2012, 2012 19th IEEE International Conference on Image Processing.

[35]  Max Mignotte,et al.  Statistical background subtraction using spatial cues , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  I. Haritaoglu,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002 .