Multi-view Sensor Fusion by Integrating Model-based Estimation and Graph Learning for Collaborative Object Localization

Collaborative object localization aims to collaboratively estimate locations of objects observed from multiple views or perspectives, which is a critical ability for multi-agent systems such as connected vehicles. To enable collaborative localization, several model-based state estimation and learning-based localization methods have been developed. Given their encouraging performance, model-based state estimation often lacks the ability to model the complex relationships among multiple objects, while learning-based methods are typically not able to fuse the observations from an arbitrary number of views and cannot well model uncertainty. In this paper, we introduce a novel spatiotemporal graph filter approach that integrates graph learning and model-based estimation to perform multi-view sensor fusion for collaborative object localization. Our approach models complex object relationships using a new spatiotemporal graph representation and fuses multi-view observations in a Bayesian fashion to improve location estimation under uncertainty. We evaluate our approach in the applications of connected autonomous driving and multiple pedestrian localization. Experimental results show that our approach outperforms previous techniques and achieves the state-of-the-art performance on collaborative localization.

[1]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Mooi Choo Chuah,et al.  GRIP: Graph-based Interaction-aware Trajectory Prediction , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[3]  J. Ferryman,et al.  PETS2009: Dataset and challenge , 2009, 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance.

[4]  Faouzi Alaya Cheikh,et al.  A hierarchical feature model for multi-target tracking , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[5]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shaojie Shen,et al.  Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving , 2018, ECCV.

[7]  Florent Altché,et al.  An LSTM network for highway trajectory prediction , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[8]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Byron Boots,et al.  4D crop monitoring: Spatio-temporal reconstruction for agriculture , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  K. Madhava Krishna,et al.  Beyond Pixels: Leveraging Geometry and Shape Cues for Online Multi-Object Tracking , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Chengjin Zhang,et al.  Master-followed Multiple Robots Cooperation SLAM Adapted to Search and Rescue Environment , 2018 .

[12]  Juho Kannala,et al.  Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization , 2018, ECCV Workshops.

[13]  Lindsay Kleeman,et al.  Optimal estimation of position and heading for mobile robots using ultrasonic beacons and dead-reckoning , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[14]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Liu Dan,et al.  Survey of connected automated vehicle perception mode: from autonomy to interaction , 2018, IET Intelligent Transport Systems.

[16]  Jianren Wang,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[17]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Alcherio Martinoli,et al.  A robust localization system for multi-robot formations based on an extension of a Gaussian mixture probability hypothesis density filter , 2020, Auton. Robots.

[19]  Mehmet Remzi Dogar,et al.  Multi-robot grasp planning for sequential assembly operations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Jesús Capitán,et al.  A Dynamic Weighted Area Assignment Based on a Particle Filter for Active Cooperative Perception , 2020, IEEE Robotics and Automation Letters.

[21]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Peng Gao,et al.  Visual Reference of Ambiguous Objects for Augmented Reality-Powered Human-Robot Communication in a Shared Workspace , 2020, HCI.

[23]  Luis Enrique Sucar,et al.  View planning for 3D object reconstruction with a mobile manipulator robot , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[25]  Manuela M. Veloso,et al.  A real-time world model for multi-robot teams with high-latency communication , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[26]  Blake Hannaford,et al.  Surgical Instrument Segmentation for Endoscopic Vision with Data Fusion of rediction and Kinematic Pose , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[27]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Timothy Bretl,et al.  PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking , 2019, IEEE Transactions on Robotics.

[30]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Tara Javidi,et al.  SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Davide Scaramuzza,et al.  A comparison of volumetric information gain metrics for active 3D object reconstruction , 2018, Auton. Robots.

[33]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Yang Liu,et al.  Multi-view People Tracking via Hierarchical Trajectory Composition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hui Zhou,et al.  Robust Multi-Modality Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Nanning Zheng,et al.  SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Peng Gao,et al.  Collaborative Localization for Occluded Objects in Connected Vehicular Platform , 2019, 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall).

[39]  Yidong Li,et al.  Constrained Confidence Matching for Planar Object Tracking , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Yin Zhou,et al.  MVX-Net: Multimodal VoxelNet for 3D Object Detection , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[41]  Peng Gao,et al.  Regularized Graph Matching for Correspondence Identification under Uncertainty in Collaborative Perception , 2020, Robotics: Science and Systems.

[42]  Marco Pavone,et al.  The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Zhaoxin Li,et al.  STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Rui Guo,et al.  Cooperative LIDAR Object Detection via Feature Sharing in Deep Networks , 2020, 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall).

[45]  Hao Wu,et al.  Accurate Vehicle Detection Using Multi-camera Data Fusion and Machine Learning , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).