论文信息 - Multi-modal RGB–Depth–Thermal Human Body Segmentation

Multi-modal RGB–Depth–Thermal Human Body Segmentation

This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB–depth–thermal dataset along with a multi-modal segmentation baseline. The several modalities are registered using a calibration device and a registration algorithm. Our baseline extracts regions of interest using background subtraction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells using different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilistic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector representation. The baseline, using Gaussian mixture models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art methods, obtaining an overlap above 75 % on the novel dataset when compared to the manually annotated ground-truth of human segmentations.

[1] Sergio Escalera,et al. User Identification and Object Recognition in Clutter Scenes Based on RGB-Depth Analysis , 2012, AMDO.

[2] Yang Wang,et al. Learning hierarchical poselets for human parsing , 2011, CVPR 2011.

[3] Jian Zhao,et al. Human segmentation by geometrically fusing visible-light and thermal imageries , 2012, Multimedia Tools and Applications.

[4] Jean-Yves Bouguet,et al. Camera calibration toolbox for matlab , 2001 .

[5] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[6] Deva Ramanan,et al. Steerable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Kai Oliver Arras,et al. People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8] W. Eric L. Grimson,et al. Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9] François Brémond,et al. ETISEO, performance evaluation for video surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[10] William W. Cohen,et al. Stacked Sequential Learning , 2005, IJCAI.

[11] Sebastian Thrun,et al. Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[12] Larry S. Davis,et al. Human body pose estimation using silhouette shape analysis , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[13] Bernt Schiele,et al. Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14] Sridha Sridharan,et al. A Mask-Based Approach for the Geometric Calibration of Thermal-Infrared Cameras , 2012, IEEE Transactions on Instrumentation and Measurement.

[15] Bernt Schiele,et al. Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[16] Sanja Fidler,et al. Bottom-Up Segmentation for Top-Down Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17] David A. McAllester,et al. Object Detection with Grammar Models , 2011, NIPS.

[18] Antonio Fernández-Caballero,et al. Real-time human segmentation in infrared videos , 2011, Expert Syst. Appl..

[19] Bernt Schiele,et al. Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Sergio Escalera,et al. Spatiotemporal analysis of RGB-D-T facial images for multimodal pain level recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21] Xin Li,et al. Pedestrian detection and tracking in infrared imagery using shape and appearance , 2007, Comput. Vis. Image Underst..

[22] Philip H. S. Torr,et al. What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[23] Thierry Bouwmans,et al. Recent Advanced Statistical Background Modeling for Foreground Detection - A Systematic Survey , 2011 .

[24] Sergio Escalera,et al. Generalized multi-scale stacked sequential learning for multi-class classification , 2015, Pattern Analysis and Applications.

[25] Vibhav Vineet,et al. PoseField: An Efficient Mean-Field Based Method for Joint Estimation of Human Pose, Segmentation, and Depth , 2013, EMMCVPR.

[26] Jitendra Malik,et al. Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[27] Ramakant Nevatia,et al. Pedestrian Detection in Infrared Images based on Local Shape Features , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Sergio Escalera,et al. Tri-modal Person Re-identification with RGB, Depth and Thermal Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[29] Simone Palazzo,et al. Kernel Density Estimation Using Joint Spatial-Color-Depth Data for Background Modeling , 2014, 2014 22nd International Conference on Pattern Recognition.

[30] Luc Van Gool,et al. Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[31] Sergio Escalera,et al. Spherical Blurred Shape Model for 3-D Object and Pose Recognition: Quantitative Analysis and HCI Applications in Smart Environments , 2014, IEEE Transactions on Cybernetics.

[32] Glenn Sheasby,et al. A Robust Stereo Prior for Human Segmentation , 2012, ACCV.

[33] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34] Joris De Schutter,et al. An adaptable system for RGB-D based human body detection and pose estimation , 2014, J. Vis. Commun. Image Represent..

[35] Thierry Bouwmans,et al. Background Modeling using Mixture of Gaussians for Foreground Detection - A Survey , 2008 .

[36] Sebastian Thrun,et al. Learning to Segment and Track in RGBD , 2012, WAFR.

[37] Deva Ramanan,et al. Learning to parse images of articulated bodies , 2006, NIPS.

[38] Jitendra Malik,et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[39] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[40] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] A. Broggi,et al. Pedestrian Detection in Far Infrared Images based on the use of Probabilistic Templates , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[42] Cordelia Schmid,et al. Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[43] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[44] Ronen Basri,et al. Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Luis Salgado,et al. Background foreground segmentation with RGB-D Kinect data: An efficient combination of classifiers , 2014, J. Vis. Commun. Image Represent..

[46] Alessio Del Bue,et al. Re-identification with RGB-D Sensors , 2012, ECCV Workshops.

[47] Sinisa Segvic,et al. Combining Spatio-Temporal Appearance Descriptors and Optical Flow for Human Action Recognition in Video Data , 2013, ArXiv.

[48] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49] Zicheng Liu,et al. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Sergio Escalera,et al. Graph cuts optimization for multi-limb human segmentation in depth maps , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Paul A. Viola,et al. Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[52] Thomas B. Moeslund,et al. Thermal cameras and applications: a survey , 2013, Machine Vision and Applications.

[53] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[54] Hema Swetha Koppula,et al. Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[55] A. Broggi,et al. Pedestrian Detection using Infrared images and Histograms of Oriented Gradients , 2006, 2006 IEEE Intelligent Vehicles Symposium.

[56] Riad I. Hammoud,et al. Robust Multi-Pedestrian Tracking in Thermal-Visible Surveillance Videos , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[57] Anat Levin,et al. Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, International Journal of Computer Vision.

[58] James W. Davis,et al. Background-subtraction using contour-based fusion of thermal and visible imagery , 2007, Comput. Vis. Image Underst..

[59] Jake K. Aggarwal,et al. Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[60] Sergio Escalera,et al. BoVDW: Bag-of-Visual-and-Depth-Words for gesture recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[61] Philip H. S. Torr,et al. Simultaneous Human Segmentation, Depth and Pose Estimation via Dual Decomposition , 2012, BMVC 2012.

[62] Adrian Hilton,et al. Visual Analysis of Humans - Looking at People , 2013 .

[63] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[64] Weihong Wang,et al. Improved human detection and classification in thermal images , 2010, 2010 IEEE International Conference on Image Processing.

[65] Basilio Sierra,et al. RGB-D, Laser and Thermal Sensor Fusion for People following in a Mobile Robot , 2013 .

[66] Maciej Stefanczyk,et al. Multimodal Segmentation of Dense Depth Maps and Associated Color Information , 2012, ICCVG.

[67] Arturo de la Escalera,et al. Contrast invariant features for human detection in far infrared images , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[68] Christophe Garcia,et al. Human activities dataset and the ICPR 2012 human activities recognition and localization competition , 2012 .

[69] Nicolas Pugeault,et al. Spelling it out: Real-time ASL fingerspelling recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[70] Jitendra Malik,et al. Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[71] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[72] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[73] Andrew Zisserman,et al. OBJ CUT , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[74] Stefan Roth,et al. Efficient Multi-cue Scene Segmentation , 2013, GCPR.

[75] Larry S. Davis,et al. An Interactive Approach to Pose-Assisted and Appearance-based Segmentation of Humans , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[76] Pushmeet Kohli,et al. PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[77] Yifei Lu,et al. Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[78] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[79] Z. Zivkovic. Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[80] Fei-Fei Li,et al. Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[81] B. Schiele,et al. Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[82] Eduardo Ros,et al. Background Subtraction Based on Color and Depth Using Active Sensors , 2013, Sensors.

[83] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[84] Gary R. Bradski,et al. Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[85] Chan-Su Lee,et al. Applications of Human Motion Tracking: Smart Lighting Control , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[86] Marie-Pierre Jolly,et al. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[87] Jean-Luc Dugelay,et al. An Efficient LBP-Based Descriptor for Facial Depth Images Applied to Gender Recognition Using RGB-D Face Data , 2012, ACCV Workshops.

[88] Ivan Laptev,et al. Pose Estimation and Segmentation of People in 3D Movies , 2013, 2013 IEEE International Conference on Computer Vision.

[89] Richard Bowden,et al. Putting the pieces together: Connected Poselets for human pose estimation , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[90] James W. Davis,et al. Robust Background-Subtraction for Person Detection in Thermal Imagery , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[91] Thomas B. Moeslund,et al. RGB-D-T Based Face Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[92] Trevor Darrell,et al. Background estimation and removal based on range and color , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[93] Riad I. Hammoud,et al. Thermal-Visible Video Fusion for Moving Target Tracking and Pedestrian Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[94] Andrew Zisserman,et al. Humanising GrabCut: Learning to segment humans using the Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[95] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[96] Limin Wang,et al. Motionlets: Mid-level 3D Parts for Human Motion Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[97] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[98] Daniel Cremers,et al. Geometrically consistent elastic matching of 3D shapes: A linear programming solution , 2011, 2011 International Conference on Computer Vision.

[99] R I Hg,et al. An RGB-D Database Using Microsoft's Kinect for Windows for Face Detection , 2012, 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems.

[100] Mark Everingham,et al. Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[101] Thomas B. Moeslund,et al. Long-Term Occupancy Analysis Using Graph-Based Optimisation in Thermal Imagery , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[102] Nassir Navab,et al. Estimating human 3D pose from Time-of-Flight images based on geodesic distances and optical flow , 2011, Face and Gesture 2011.