Visual Tracking of Multiple Humans with Machine Learning based Robustness Enhancement applied to Real-World Robotic Systems

This thesis presents a robust real-time vision-based 3D multiple human tracker capable of automatically identifying, labelling and tracking multiple humans. The primary contribution is a methodology to improve robustness and integration in real-world scenarios. The system consists of 2 stages, 1. a vision based human tracking system using multiple visual cues and a robust occlusion handler, 2. a machine learning based module for intelligent multi-modal fusion and robust to drastic changes in lighting. The fusion module performs an on line analysis of image parameters influencing the performance of the tracker. Thereafter, optimal weights are generated for each visual modality for the current scene. The thesis also proposes a novel approach to validate the 3D multiple human tracking system through zero-error ground truth data. It shows integration into variety of distributed robotic systems being used in real world applications.

[1]  G. Uhlenbeck,et al.  On the Theory of the Brownian Motion , 1930 .

[2]  A. Bhattacharyya On a measure of divergence between two statistical populations defined by their probability distributions , 1943 .

[3]  P. Consul,et al.  A Generalization of the Poisson Distribution , 1973 .

[4]  Donald P. Greenberg,et al.  Color spaces for computer graphics , 1978, SIGGRAPH.

[5]  King-Sun Fu,et al.  A survey on image segmentation , 1981, Pattern Recognit..

[6]  G.R. Loefer,et al.  An Infrared Background Clutter Model Using 3-D Computer Graphics , 1983, IEEE Computer Graphics and Applications.

[7]  Marshall Weathersby,et al.  Detection Performance in Clutter with Variable Resolution , 1983, IEEE Transactions on Aerospace and Electronic Systems.

[8]  Daniel T. Jones,et al.  Developing countries and the future of the automobile industry , 1985 .

[9]  David E. Schmieder,et al.  An Experiment Quantifying The Effect Of Clutter On Target Detection , 1985, Optics & Photonics.

[10]  Keiichi Abe,et al.  Topological structural analysis of digitized binary images by border following , 1985, Comput. Vis. Graph. Image Process..

[11]  Carl H.A. Dassbach Industrial Robots in the American Automobile Industry , 1986 .

[12]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[13]  Steven A. Shafer,et al.  An architecture for sensor fusion in a mobile robot , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[14]  渡辺 進,et al.  Microelectronics, automation and employment in the automobile industry , 1987 .

[15]  W. Streeck Industrial Relations and Industrial Change: The Restructuring of the World Automobile Industry in the 1970s and 1980s , 1987 .

[16]  Hans P. Moravec Sensor Fusion in Certainty Grids for Mobile Robots , 1988, AI Mag..

[17]  J. M. Cathcart,et al.  Infrared Target Detection in Structured Urban Scenes1 , 1988 .

[18]  Robert J. Schalkoff,et al.  Digital Image Processing and Computer Vision , 1989 .

[19]  Ingemar J. Cox,et al.  An Analysis of Camera Noise , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Sankar K. Pal,et al.  A review on image segmentation techniques , 1993, Pattern Recognit..

[21]  Greg Welch,et al.  Welch & Bishop , An Introduction to the Kalman Filter 2 1 The Discrete Kalman Filter In 1960 , 1994 .

[22]  Paul Ingrassia,et al.  Comeback: The Fall and Rise of the American Automobile Industry , 1994 .

[23]  下川 浩一 The Japanese automobile industry : a business history , 1994 .

[24]  Arnaldo Camuffo,et al.  Dynamic Capabilities and Manufacturing Automation: Organizational Learning in the Italian Automobile Industry , 1996 .

[25]  Peter I. Corke,et al.  A tutorial on visual servo control , 1996, IEEE Trans. Robotics Autom..

[26]  Vasant Honavar,et al.  On sensor evolution in robotics , 1996 .

[27]  Yasuyuki Yamada,et al.  Fail-safe human/robot contact in the safety space , 1996, Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN'96 TSUKUBA.

[28]  Jake K. Aggarwal,et al.  Tracking human motion using multiple cameras , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[29]  Moshe Kam,et al.  Sensor Fusion for Mobile Robot Navigation , 1997, Proc. IEEE.

[30]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Christian Schlegel,et al.  Vision Based Person Tracking with a Mobile Robot , 1998, BMVC.

[32]  W. Richard Stevens,et al.  UNIX network programming, volume 2 (2nd ed.): interprocess communications , 1998 .

[33]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[34]  Michael Isard,et al.  Active Contours: The Application of Techniques from Graphics, Vision, Control Theory and Statistics to Visual Tracking of Shapes in Motion , 2000 .

[35]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[36]  Dariu Mihai Gavrila,et al.  Vision-based 3-D tracking of humans in action , 1998 .

[37]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[38]  Eric Horvitz,et al.  Bayesian Modality Fusion: Probabilistic Integration of Multiple Vision Algorithms for Head Tracking , 1999 .

[39]  Gregory M. P. O'Hare,et al.  Social robotics: reality and virtuality in agent-based robotics , 1999 .

[40]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[41]  D. Runde,et al.  How to realize a natural image reproduction using stereoscopic displays with motion parallax , 2000, IEEE Trans. Circuits Syst. Video Technol..

[42]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[43]  Dariu Gavrila,et al.  Pedestrian Detection from a Moving Vehicle , 2000, ECCV.

[44]  François Michaud,et al.  Artificial Emotion and Social Robotics , 2000, DARS.

[45]  Ken Perlin,et al.  An autostereoscopic display , 2000, SIGGRAPH.

[46]  Thia Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software , 2001 .

[47]  Shaogang Gong,et al.  Tracking multiple people with a multi-camera system , 2001, Proceedings 2001 IEEE Workshop on Multi-Object Tracking.

[48]  Bülent Sankur,et al.  Color image segmentation using histogram multithresholding and fusion , 2001, Image Vis. Comput..

[49]  Mubarak Shah,et al.  Human tracking in multiple cameras , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[50]  A. M. Tekalp,et al.  Multiple camera tracking of interacting and occluded human motion , 2001, Proc. IEEE.

[51]  Tucker R. Balch,et al.  Distributed sensor fusion for object position estimation by multi-robot systems , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[52]  Michael Isard,et al.  BraMBLe: a Bayesian multiple-blob tracker , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[53]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[54]  Stephen J. Maybank,et al.  Fusion of Multiple Tracking Algorithms for Robust People Tracking , 2002, ECCV.

[55]  Rainer Stiefelhagen,et al.  Towards vision-based 3-D people tracking in a smart room , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[56]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[57]  Alexander Zelinsky,et al.  Quantitative Safety Guarantees for Physical Human-Robot Interaction , 2003, Int. J. Robotics Res..

[58]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..

[60]  Manabu Hashimoto,et al.  Multiple-person tracker with a fixed slanting stereo camera , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[61]  Chandrika Kamath,et al.  Robust techniques for background subtraction in urban traffic video , 2004, IS&T/SPIE Electronic Imaging.

[62]  L. Davis,et al.  M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene , 2003, International Journal of Computer Vision.

[63]  Christoph Fehn,et al.  Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV , 2004, IS&T/SPIE Electronic Imaging.

[64]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[65]  Michael Beetz,et al.  The Contracting Curve Density Algorithm: Fitting Parametric Curve Models to Images Using Local Self-Adapting Separation Criteria , 2004, International Journal of Computer Vision.

[66]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[67]  Wojciech Matusik,et al.  3D TV: a scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes , 2004, ACM Trans. Graph..

[68]  Bodo Rosenhahn,et al.  Automatic Human Model Generation , 2005, CAIP.

[69]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[70]  Luc Van Gool,et al.  GPU-Based Foreground-Background Segmentation using an Extended Colinearity Criterion , 2005 .

[71]  Stefano Selleri,et al.  The Official Blender 2.3 Guide: Free 3D Creation Suite for Modeling, Animation, and Rendering , 2005 .

[72]  Frank Dellaert,et al.  MCMC-based particle filtering for tracking a variable number of interacting targets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Liang Zhang,et al.  Stereoscopic image generation based on depth images for 3D TV , 2005, IEEE Transactions on Broadcasting.

[74]  Neil A. Dodgson,et al.  Autostereoscopic 3D displays , 2005, Computer.

[75]  Dana Kulic,et al.  Real-time safety for human - robot interaction , 2005, ICAR '05. Proceedings., 12th International Conference on Advanced Robotics, 2005..

[76]  Daniele Nardi,et al.  Real-time tracking of multiple people through stereo vision , 2005 .

[77]  Sridha Sridharan,et al.  Real-Time Adaptive Foreground/Background Segmentation , 2005, EURASIP J. Adv. Signal Process..

[78]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[79]  Reinhard Koch,et al.  Nonlinear Body Pose Estimation from Depth Images , 2005, DAGM-Symposium.

[80]  Harpreet S. Sawhney,et al.  Real-time wide area multi-camera stereo tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[81]  Seth Hutchinson,et al.  Visual Servo Control Part I: Basic Approaches , 2006 .

[82]  Ignazio Gallo,et al.  Dense Stereo Matching with Growing Aggregation and Neural Learning , 2006, VISIGRAPP.

[83]  Gaurav S. Sukhatme,et al.  People tracking and following with mobile robot using an omnidirectional camera and a laser , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[84]  Alois Knoll,et al.  An Efficient and Robust Real-Time Contour Tracking System , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[85]  Jianwei Zhang,et al.  Multimodal People Tracking and Trajectory Prediction based on Learned Generalized Motion Patterns , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[86]  Tieniu Tan,et al.  Principal axis-based correspondence between multiple cameras for people tracking , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[88]  Pascal Fua,et al.  Robust People Tracking with Global Trajectory Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[89]  Larry S. Davis,et al.  Multi-camera Tracking and Segmentation of Occluded People on Ground Plane Using Search-Guided Particle Filtering , 2006, ECCV.

[90]  Rafael Muñoz-Salinas,et al.  People detection and tracking using stereo vision and color , 2007, Image Vis. Comput..

[91]  Chun Zhang,et al.  The Use of a Mobile Robot for Complete Sample Management in a Cell Culture Pilot Plant , 2007 .

[92]  Michael A. Goodrich,et al.  Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[93]  Pascal Fua,et al.  Surface Deformation Models for Nonrigid 3D Shape Recovery , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  Alin Albu-Schäffer,et al.  Safety Evaluation of Physical Human-Robot Interaction via Crash-Testing , 2007, Robotics: Science and Systems.

[95]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[96]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  Daniel Cremers,et al.  An Improved Algorithm for TV-L 1 Optical Flow , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[98]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[99]  Alois Knoll,et al.  Joint-action for humans and industrial robots for assembly tasks , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[100]  Alois Knoll,et al.  A unifying software architecture for model-based visual tracking , 2008, Electronic Imaging.

[101]  Peter Carr,et al.  GPU Accelerated Multimodal Background Subtraction , 2008, 2008 Digital Image Computing: Techniques and Applications.

[102]  Carlos Hitoshi Morimoto,et al.  People Detection under Occlusion in Multiple Camera Views , 2008, 2008 XXI Brazilian Symposium on Computer Graphics and Image Processing.

[103]  Alois Knoll,et al.  A multi-camera person tracking system for robotic applications in virtual reality TV studio , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[104]  Yael Moses,et al.  Homography based multiple camera detection and tracking of people in a dense crowd , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[105]  Pascal Fua,et al.  Multi-camera Tracking and Atypical Motion Detection with Behavioral Maps , 2008, ECCV.

[106]  Alois Knoll,et al.  ITrackU: An Integrated Framework for Image-based Tracking and Understanding , 2008 .

[107]  Bernt Schiele,et al.  Sliding-Windows for Rapid Object Class Localization: A Parallel Technique , 2008, DAGM-Symposium.

[108]  Pascal Fua,et al.  Making Background Subtraction Robust to Sudden Illumination Changes , 2008, ECCV.

[109]  Bernt Schiele,et al.  Visual People Detection - Different Models, Comparison and Discussion , 2009, ICRA 2009.

[110]  Pietro Perona,et al.  Pedestrian detection: A benchmark , 2009, CVPR.

[111]  Rüdiger Dillmann,et al.  Fusion of 2d and 3d sensor data for articulated body tracking , 2009, Robotics Auton. Syst..

[112]  Mubarak Shah,et al.  Tracking Multiple Occluding People by Localizing on Multiple Scene Planes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[113]  Tiziana D'Orazio,et al.  A Semi-automatic System for Ground Truth Generation of Soccer Video Sequences , 2009, 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance.

[114]  Maria Chiara Carrozza,et al.  Towards Humanlike Social Touch for Sociable Robotics and Prosthetics: Comparisons on the Compliance, Conformance and Hysteresis of Synthetic and Human Fingertip Skins , 2009, Int. J. Soc. Robotics.

[115]  Reza Olfati-Saber,et al.  Kalman-Consensus Filter : Optimality, stability, and performance , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[116]  Gwenn Englebienne,et al.  Tracking in sparse multi-camera setups using stereo vision , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[117]  Amit K. Roy-Chowdhury,et al.  Distributed multi-target tracking in a self-configuring camera network , 2009, CVPR.

[118]  Alois Knoll,et al.  A distributed and scalable person tracking system for robotic visual servoing with 8 dof in virtual reality TV studio automation , 2009, 2009 6th International Symposium on Mechatronics and its Applications.

[119]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[120]  Alois Knoll,et al.  Visual servoing of presenters in augmented virtual reality TV studios , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[121]  Bohyung Han,et al.  Multi-Camera Tracking with Adaptive Resource Allocation , 2010, International Journal of Computer Vision.

[122]  Emile A. Hendriks,et al.  Real time multiple people tracking and pose estimation , 2010, MPVA '10.

[123]  Gérard G. Medioni,et al.  Human pose estimation from a single view point, real-time range sensor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[124]  María Malfaz,et al.  Selection of Actions for an Autonomous Social Robot , 2010, ICSR.

[125]  Carlos Hitoshi Morimoto,et al.  Multiple camera people detection and tracking using support integration , 2011, Pattern Recognit. Lett..

[126]  Alois Knoll,et al.  3D Position based multiple human servoing by low-level-control of 6 DOF industrial robot , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[127]  Giorgio Panin,et al.  Model-based Visual Tracking: The OpenTL Framework , 2011 .

[128]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[129]  John X. Liu Computer Vision And Robotics , 2005 .