Real-Time Multiple Human Perception With Color-Depth Cameras on a Mobile Robot

The ability to perceive humans is an essential requirement for safe and efficient human-robot interaction. In real-world applications, the need for a robot to interact in real time with multiple humans in a dynamic, 3-D environment presents a significant challenge. The recent availability of commercial color-depth cameras allow for the creation of a system that makes use of the depth dimension, thus enabling a robot to observe its environment and perceive in the 3-D space. Here we present a system for 3-D multiple human perception in real time from a moving robot equipped with a color-depth camera and a consumer-grade computer. Our approach reduces computation time to achieve real-time performance through a unique combination of new ideas and established techniques. We remove the ground and ceiling planes from the 3-D point cloud input to separate candidate point clusters. We introduce the novel information concept, depth of interest, which we use to identify candidates for detection, and that avoids the computationally expensive scanning-window methods of other approaches. We utilize a cascade of detectors to distinguish humans from objects, in which we make intelligent reuse of intermediary features in successive detectors to improve computation. Because of the high computational cost of some methods, we represent our candidate tracking algorithm with a decision directed acyclic graph, which allows us to use the most computationally intense techniques only where necessary. We detail the successful implementation of our novel approach on a mobile robot and examine its performance in scenarios with real-world challenges, including occlusion, robot motion, nonupright humans, humans leaving and reentering the field of view (i.e., the reidentification challenge), human-object and human-human interaction. We conclude with the observation that the incorporation of the depth information, together with the use of modern techniques in new ways, we are able to create an accurate system for real-time 3-D perception of humans by a mobile robot.

[1]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Aurélie Bugeau,et al.  Tracking with Occlusions via Graph Cuts , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Hironobu Fujiyoshi,et al.  Real-Time Human Detection Using Relational Depth Similarity Features , 2010, ACCV.

[6]  A. G. Amitha Perera,et al.  A unified framework for tracking through occlusions and across sensor gaps , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[8]  Ramakant Nevatia,et al.  Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors , 2007, International Journal of Computer Vision.

[9]  Kikuo Fujimura,et al.  Human detection using depth and gray images , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[10]  James J. Little,et al.  A Boosted Particle Filter: Multitarget Detection and Tracking , 2004, ECCV.

[11]  Kai Oliver Arras,et al.  People tracking in RGB-D data with on-line boosted target models , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Luc Van Gool,et al.  Robust Multiperson Tracking from a Mobile Platform , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Carlo Tomasi,et al.  People Detection Using Color and Depth Images , 2011, MCPR.

[14]  Antonio Torralba,et al.  Unsupervised Detection of Regions of Interest Using Iterative Link Analysis , 2009, NIPS.

[15]  Garry A. Einicke,et al.  Robust extended Kalman filtering , 1999, IEEE Trans. Signal Process..

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Luc Van Gool,et al.  Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yaakov Bar-Shalom,et al.  Sonar tracking of multiple targets using joint probabilistic data association , 1983 .

[21]  Roland Siegwart,et al.  A Layered Approach to People Detection in 3D Range Data , 2010, AAAI.

[22]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[23]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[24]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[25]  Silvio Savarese,et al.  Detecting and tracking people using an RGB-D camera via multiple detector fusion , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[27]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[29]  James K Richardson,et al.  Pilot evaluation of a novel clinical test of reaction time in national collegiate athletic association division I football players. , 2010, Journal of athletic training.

[30]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[32]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[34]  Oswald Lanz,et al.  Approximate Bayesian multibody tracking , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.

[36]  Dariu Gavrila,et al.  Ieee Transactions on Intelligent Transportation Systems the Benefits of Dense Stereo for Pedestrian Detection , 2022 .

[37]  Sonia Chernova,et al.  Mobile human-robot teaming with environmental tolerance , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[38]  Rafael Muñoz-Salinas,et al.  People detection and tracking using stereo vision and color , 2007, Image Vis. Comput..

[39]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[41]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[42]  Afshin Dehghan,et al.  Part-based multiple-person tracking with partial occlusion handling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  James J. Little,et al.  Robust Visual Tracking for Multiple Targets , 2006, ECCV.