Video-guided Camera Control for Target Tracking and Following

Abstract This paper considers the problem of controlling a nonholonomic mobile ground robot equipped with an onboard camera characterized by a bounded field-of-view, tasked with detecting and following a potentially moving human target using onboard computing and video processing in real time. Computer vision algorithms have been recently shown highly effective at object detection and classification in images obtained by vision sensors. Existing methods typically assume a stationary camera and/or use pre-recorded image sequences that do not provide a causal relationship with future images. The control method developed in this paper seeks to improve the performance of the computer vision algorithms, by planning the robot/camera trajectory relative to the moving target based on the desired size and position of the target in the image plane, without the need to estimate the target’s range. The method is tested and validated using a highly realistic and interactive game programming environment, known as Unreal Engine™, that allows for closed-loop simulations of the robot-camera system. Results are further validated through physical experiments using a Clearpath™ Jackal robot equipped with a camera which is capable of following a human target for long time periods. Both simulation and experimental results show that the proposed vision-based controller is capable of stabilizing the target object size and position in the image plane for extended periods of time.

[1]  Alan L. Yuille,et al.  UnrealCV: Connecting Computer Vision to Unreal Engine , 2016, ECCV Workshops.

[2]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Yoshiaki Shirai,et al.  Optical flow-based person tracking by multiple cameras , 2001, Conference Documentation International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI 2001 (Cat. No.01TH8590).

[4]  Jonathan P. How,et al.  Camera control for learning nonlinear target dynamics via Bayesian nonparametric Dirichlet-process Gaussian-process (DP-GP) models , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Niall McLaughlin,et al.  Video Person Re-Identification for Wide Area Tracking Based on Recurrent Neural Networks , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Alberto Sanfeliu,et al.  Continuous real time POMCP to find-and-follow people by a humanoid service robot , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[7]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[9]  Alexander Domahidi,et al.  Real-Time Motion Planning for Aerial Videography With Real-Time With Dynamic Obstacle Avoidance and Viewpoint Optimization , 2017, IEEE Robotics and Automation Letters.

[10]  Ian D. Reid,et al.  Stable multi-target tracking in real-time surveillance video , 2011, CVPR 2011.

[11]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[13]  Xiaolin Hu,et al.  UnrealStereo: A Synthetic Dataset for Analyzing Stereo Vision , 2016, ArXiv.

[14]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[15]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Xin Yao,et al.  The Future of Camera Networks: Staying Smart in a Chaotic World , 2017, ICDSC.

[20]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.