Multiple Human Tracking in RGB-D Data: A Survey

Multiple human tracking (MHT) is a fundamental task in many computer vision applications. Appearance-based approaches, primarily formulated on RGB data, are constrained and affected by problems arising from occlusions and/or illumination variations. In recent years, the arrival of cheap RGB-Depth (RGB-D) devices has {led} to many new approaches to MHT, and many of these integrate color and depth cues to improve each and every stage of the process. In this survey, we present the common processing pipeline of these methods and review their methodology based (a) on how they implement this pipeline and (b) on what role depth plays within each stage of it. We identify and introduce existing, publicly available, benchmark datasets and software resources that fuse color and depth data for MHT. Finally, we present a brief comparative evaluation of the performance of those works that have applied their methods to these datasets.

[1]  Manolis I. A. Lourakis,et al.  Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera , 2004, ECCV.

[2]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Bernt Schiele,et al.  Disparity Statistics for Pedestrian Detection: Combining Appearance, Motion and Stereo , 2010, ECCV.

[4]  Antonis A. Argyros,et al.  Multicamera tracking of multiple humans based on colored visual hulls , 2013, 2013 IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA).

[5]  Nadir Weibel,et al.  The VideoMob Interactive Art Installation Connecting Strangers through Inclusive Digital Crowds , 2015, ACM Trans. Interact. Intell. Syst..

[6]  Robert C. Bolles,et al.  Background modeling for segmentation of video-rate stereo sequences , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[7]  Ye Liu,et al.  Detecting and tracking people in real time with RGB-D camera , 2015, Pattern Recognit. Lett..

[8]  Antonis A. Argyros,et al.  Multicamera human detection and tracking supporting natural interaction with large-scale displays , 2012, Machine Vision and Applications.

[9]  Sandipan P. Narote,et al.  Real-time human detection and tracking , 2013, 2013 Annual IEEE India Conference (INDICON).

[10]  Matteo Munaro,et al.  RGB-D Human Detection and Tracking for Industrial Environments , 2014, IAS.

[11]  Kai Oliver Arras,et al.  People detection in RGB-D data , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[13]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  Jun Miura,et al.  Visual Person Identification Using a Distance-dependent Appearance Model for a Person Following Robot , 2012, Int. J. Autom. Comput..

[15]  ChoiWongun,et al.  A General Framework for Tracking Multiple People from a Moving Camera , 2013 .

[16]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[17]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[18]  David R. Chambers,et al.  High-accuracy real-time pedestrian detection system using 2D and 3D features , 2012, Defense, Security, and Sensing.

[19]  Matteo Munaro,et al.  Fast RGB-D people tracking for service robots , 2014, Auton. Robots.

[20]  Daniele Nardi,et al.  Real-time people localization and tracking through fixed stereo vision , 2005, Applied Intelligence.

[21]  Bernt Schiele,et al.  Disparity statistics for pedestrian detection: combining appearance, motion and stereo , 2010, ECCV 2010.

[22]  Rafael Muñoz-Salinas,et al.  A Bayesian plan-view map based approach for multiple-person detection and tracking , 2008, Pattern Recognit..

[23]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[24]  Silvio Savarese,et al.  Detecting and tracking people using an RGB-D camera via multiple detector fusion , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25]  Andrzej Czyzewski,et al.  Performance evaluation of video object tracking algorithm in autonomous surveillance system , 2010, 2010 2nd International Conference on Information Technology, (2010 ICIT).

[26]  Niall Twomey,et al.  Bridging e-Health and the Internet of Things: The SPHERE Project , 2015, IEEE Intelligent Systems.

[27]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Kehar Singh,et al.  Modified geometry of ring-wedge detector for sampling Fourier transform of fingerprints for classification using neural networks , 2003, International Commission for Optics.

[29]  Jing Wang,et al.  Online learning 3D context for robust visual tracking , 2015, Neurocomputing.

[30]  Bastian Leibe,et al.  Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Ye Liu,et al.  An ultra-fast human detection method for color-depth camera , 2015, J. Vis. Commun. Image Represent..

[32]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[33]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[34]  Matteo Munaro,et al.  OpenPTrack: People Tracking for Heterogeneous Networks of Color-Depth Cameras , 2014 .

[35]  Andreas Zell,et al.  Real time person detection and tracking by mobile robots using RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014).

[36]  Bingbing Ni,et al.  Crowded Scene Analysis: A Survey , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Sung-Jea Ko,et al.  Robust people counting system based on sensor fusion , 2012, IEEE Transactions on Consumer Electronics.

[38]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Graeme A. Jones,et al.  A Depth-based Polar Coordinate System for People Segmentation and Tracking with Multiple RGB-D Sensors , 2014 .

[40]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[41]  Huosheng Hu,et al.  Human motion tracking for rehabilitation - A survey , 2008, Biomed. Signal Process. Control..

[42]  Qixiang Ye,et al.  Real-Time Multipedestrian Tracking in Traffic Scenes via an RGB-D-Based Layered Graph Model , 2015, IEEE Transactions on Intelligent Transportation Systems.

[43]  Alexandros André Chaaraoui,et al.  A review on vision techniques applied to Human Behaviour Analysis for Ambient-Assisted Living , 2012, Expert Syst. Appl..

[44]  Fakhreddine Ababsa,et al.  Hybrid 3D–2D human tracking in a top view , 2014, Journal of Real-Time Image Processing.

[45]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Ying Cui,et al.  Real-time human detection and tracking in complex environments using single RGBD camera , 2013, 2013 IEEE International Conference on Image Processing.

[47]  Michael Harville,et al.  Foreground segmentation using adaptive mixture models in color and depth , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[48]  Bastian Leibe,et al.  Close-Range Human Detection and Tracking for Head-Mounted Cameras , 2012, BMVC.

[49]  Peter H. N. de With,et al.  Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment , 2012, IEEE Transactions on Consumer Electronics.

[50]  Qi Wang,et al.  Multi-cue based tracking , 2014, Neurocomputing.

[51]  Andreas Zell,et al.  Real time face detection using geometric constraints, navigation and depth-based skin segmentation on mobile robots , 2012, 2012 IEEE International Symposium on Robotic and Sensors Environments Proceedings.

[52]  SchindlerKonrad,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009 .

[53]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[54]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Yongkang Wong,et al.  Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition , 2015, ACM Multimedia.

[56]  Carlo Tomasi,et al.  People Detection Using Color and Depth Images , 2011, MCPR.

[57]  Konrad Schindler,et al.  Multi-Target Tracking by Discrete-Continuous Energy Minimization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Wenhan Luo,et al.  Multiple Object Tracking: A Review , 2014, ArXiv.

[59]  Luc Van Gool,et al.  Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Hong Liu,et al.  Depth Motion Detection—A Novel RS-Trigger Temporal Logic based Method , 2014, IEEE Signal Processing Letters.

[61]  Luigi Cinque,et al.  Multisubjects Tracking by Time-of-Flight Camera , 2013, ICIAP.

[62]  Silvio Savarese,et al.  Ieee Transaction on Pattern Analysis and Machine Intelligence 1 a General Framework for Tracking Multiple People from a Moving Camera , 2022 .

[63]  Armin B. Cremers,et al.  Adaptive Multi-cue 3D Tracking of Arbitrary Objects , 2012, DAGM/OAGM Symposium.

[64]  Xiaogang Wang,et al.  Intelligent multi-camera video surveillance: A review , 2013, Pattern Recognit. Lett..

[65]  Luis Salgado,et al.  Background foreground segmentation with RGB-D Kinect data: An efficient combination of classifiers , 2014, J. Vis. Commun. Image Represent..

[66]  Rafael Muñoz-Salinas,et al.  Adaptive multi-modal stereo people tracking without background modelling , 2008, J. Vis. Commun. Image Represent..

[67]  Kai Oliver Arras,et al.  Multi-model hypothesis tracking of groups of people in RGB-D data , 2014, 17th International Conference on Information Fusion (FUSION).

[68]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[69]  Luis Salgado,et al.  Advanced background modeling with RGB-D sensors through classifiers combination and inter-frame foreground prediction , 2014, Machine Vision and Applications.

[70]  Manoranjan Paul,et al.  Human detection in surveillance videos and its applications - a review , 2013, EURASIP J. Adv. Signal Process..

[71]  Mayank Bansal,et al.  A real-time pedestrian detection system based on structure and appearance classification , 2010, 2010 IEEE International Conference on Robotics and Automation.

[72]  Ingemar J. Cox,et al.  An Efficient Implementation of Reid's Multiple Hypothesis Tracking Algorithm and Its Evaluation for the Purpose of Visual Tracking , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Fabien Cardinaux,et al.  Video based technology for ambient assisted living: A review of the literature , 2011, J. Ambient Intell. Smart Environ..

[74]  J. Miura,et al.  Robust Stereo-Based Person Detection and Tracking for a Person Following Robot , 2009 .

[75]  Rafael Muñoz-Salinas,et al.  People detection and tracking using stereo vision and color , 2007, Image Vis. Comput..

[76]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[77]  Matteo Munaro,et al.  Tracking people within groups with RGB-D data , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[78]  Xenophon Zabulis,et al.  Tracking persons using a network of RGBD cameras , 2014, PETRA '14.

[79]  Matteo Munaro,et al.  OpenPTrack: Open source multi-camera calibration and people tracking for RGB-D camera networks , 2016, Robotics Auton. Syst..

[80]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[81]  Francisco José Madrid-Cuevas,et al.  People detection and tracking with multiple stereo cameras using particle filters , 2009, J. Vis. Commun. Image Represent..

[82]  Jianxiong Xiao,et al.  Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines , 2013, 2013 IEEE International Conference on Computer Vision.

[83]  Monique Thonnat,et al.  Event Recognition System for Older People Monitoring Using an RGB-D Camera , 2013 .

[84]  Stefano Soatto,et al.  Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[85]  F FelzenszwalbPedro,et al.  Efficient Belief Propagation for Early Vision , 2006 .

[86]  Michael Harville,et al.  Stereo person tracking with adaptive plan-view templates of height and occupancy statistics , 2004, Image Vis. Comput..

[87]  Pascal Fua,et al.  Probability occupancy maps for occluded depth images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Hong Wei,et al.  A survey of human motion analysis using depth imagery , 2013, Pattern Recognit. Lett..

[89]  Kai Oliver Arras,et al.  People tracking in RGB-D data with on-line boosted target models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[90]  Bastian Leibe,et al.  Efficient Use of Geometric Constraints for Sliding-Window Object Detection in Video , 2011, ICVS.

[91]  Kehar Singh,et al.  Modified geometry of ring-wedge detector for sampling Fourier transform of fingerprints for classification using neural networks , 2004 .

[92]  Robin R. Murphy,et al.  Hand gesture recognition with depth images: A review , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[93]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[94]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[95]  Suchi Saria,et al.  Deformable Distributed Multiple Detector Fusion for Multi-Person Tracking , 2015, ArXiv.

[96]  Emilio J. Almazan,et al.  Tracking People across Multiple Non-overlapping RGB-D Sensors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[97]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[98]  Marjorie Skubic,et al.  Fall Detection in Homes of Older Adults Using the Microsoft Kinect , 2015, IEEE Journal of Biomedical and Health Informatics.

[99]  Trevor Darrell,et al.  Background estimation and removal based on range and color , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[100]  James J. Little,et al.  Learning to Track and Identify Players from Broadcast Sports Videos , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Jing Zhang,et al.  RGB-D-based action recognition datasets: A survey , 2016, Pattern Recognit..

[102]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[103]  Shane Brennan,et al.  A Fast Stereo-based System for Detecting and Tracking Pedestrians from a Moving Vehicle , 2009, Int. J. Robotics Res..

[104]  David Gerónimo Gómez,et al.  Survey of Pedestrian Detection for Advanced Driver Assistance Systems , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105]  Lynne E. Parker,et al.  Real-Time Multiple Human Perception With Color-Depth Cameras on a Mobile Robot , 2013, IEEE Transactions on Cybernetics.

[106]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[107]  Anton Kummert,et al.  Applications for a people detection and tracking algorithm using a time-of-flight camera , 2014, Multimedia Tools and Applications.

[108]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.