Underwater multi-robot convoying using visual tracking by detection

We present a robust multi-robot convoying approach that relies on visual detection of the leading agent, thus enabling target following in unstructured 3-D environments. Our method is based on the idea of tracking-by-detection, which interleaves efficient model-based object detection with temporal filtering of image-based bounding box estimation. This approach has the important advantage of mitigating tracking drift (i.e. drifting away from the target object), which is a common symptom of model-free trackers and is detrimental to sustained convoying in practice. To illustrate our solution, we collected extensive footage of an underwater robot in ocean settings, and hand-annotated its location in each frame. Based on this dataset, we present an empirical comparison of multiple tracker variants, including the use of several convolutional neural networks, both with and without recurrent connections, as well as frequency-based model-free trackers. We also demonstrate the practicality of this tracking-by-detection strategy in real-world scenarios by successfully controlling a legged underwater robot in five degrees of freedom to follow another robot's independent motion.

[1]  Hans-Joachim Wünsche,et al.  Monocular template-based vehicle tracking for autonomous convoy driving , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Jiang Li,et al.  On the Performance of Location-Centric Storage in Sensor Networks , 2007, International Conference on Wireless Algorithms, Systems and Applications (WASA 2007).

[4]  Md Jahidul Islam,et al.  Mixed-domain biological motion tracking for underwater human-robot interaction , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Winston Khoon Guan Seah,et al.  Localization in underwater sensor networks: survey and challenges , 2006, Underwater Networks.

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Hans-Joachim Wünsche,et al.  Monocular model-based 3D vehicle tracking for autonomous vehicles in unstructured environment , 2011, 2011 IEEE International Conference on Robotics and Automation.

[9]  Gregory Dudek,et al.  Underwater human-robot interaction via biological motion identification , 2009, Robotics: Science and Systems.

[10]  Gregory Dudek,et al.  A Boosting Approach to Visual Servo-Control of an Underwater Robot , 2009, ISER.

[11]  Gregory Dudek,et al.  Learning legged swimming gaits from experience , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Peter I. Corke,et al.  Experiments with Underwater Robot Localization and Tracking , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[14]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Frédéric Maire,et al.  Vision based anti-collision system for rail track maintenance vehicles , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[16]  Gregory Dudek,et al.  Enabling autonomous capabilities in underwater robotics , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Andreas Zell,et al.  Following a quadrotor with another quadrotor using onboard vision , 2013, 2013 European Conference on Mobile Robots.

[18]  Michael R. M. Jenkin,et al.  Experiments in sensing and communication for robot convoy navigation , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[19]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  M. Gerla,et al.  AUV-Aided Localization for Underwater Sensor Networks , 2007, International Conference on Wireless Algorithms, Systems and Applications (WASA 2007).

[21]  Ales Leonardis,et al.  Visual Object Tracking Performance Measures Revisited , 2015, IEEE Transactions on Image Processing.

[22]  Zhihai He,et al.  Spatially supervised recurrent convolutional neural networks for visual object tracking , 2016, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[23]  Bruce A. Francis,et al.  A vision-based robotic follower vehicle , 2009, Defense + Commercial Sensing.

[24]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Gérard G. Medioni,et al.  Online Tracking and Reacquisition Using Co-trained Generative and Discriminative Trackers , 2008, ECCV.

[26]  Eric Royer,et al.  Using monocular visual SLAM to manually convoy a fleet of automatic urban vehicles , 2013, 2013 IEEE International Conference on Robotics and Automation.

[27]  Ronald Lumia,et al.  Vision-based robotic convoy driving , 1995, Machine Vision and Applications.

[28]  Gregory Dudek,et al.  Robust servo-control for underwater robots using banks of visual filters , 2009, 2009 IEEE International Conference on Robotics and Automation.

[29]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[30]  Gregory Dudek,et al.  Exploring Underwater Environments with Curiosity , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[31]  Gregory Dudek,et al.  3D trajectory synthesis and control for a legged swimming robot , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.