Track without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking

Vehicle tracking is an essential task in the multi-object tracking (MOT) field. A distinct characteristic in vehicle tracking is that the trajectories of vehicles are fairly smooth in both the world coordinate and the image coordinate. Hence, models that capture motion consistencies are of high necessity. However, tracking with the standalone motionbased trackers is quite challenging because targets could get lost easily due to limited information, detection error and occlusion. Leveraging appearance information to assist object re-identification could resolve this challenge to some extent. However, doing so requires extra computation while appearance information is sensitive to occlusion as well. In this paper, we try to explore the significance of motion patterns for vehicle tracking without appearance information. We propose a novel approach that tackles the association issue for long-term tracking with the exclusive fully-exploited motion information. We address the tracklet embedding issue with the proposed reconstruct-to-embed strategy based on deep graph convolutional neural networks (GCN). Comprehensive experiments on the KITTIcar tracking dataset and UA-Detrac dataset show that the proposed method, though without appearance information, could achieve competitive performance with the state-ofthe-art (SOTA) trackers. The source code will be available at https://github.com/GaoangW/LGMTracker.

[1]  Ming-Hsuan Yang,et al.  Bayesian Multi-object Tracking Using Motion Context from Multiple Objects , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[2]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Enkhbayar Erdenee,et al.  Multi-class Multi-object Tracking Using Changing Point Detection , 2016, ECCV Workshops.

[4]  Jenq-Neng Hwang,et al.  Single-Camera and Inter-Camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Bastian Leibe,et al.  Track to Reconstruct and Reconstruct to Track , 2020, IEEE Robotics and Automation Letters.

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kris Kitani,et al.  Joint Detection and Multi-Object Tracking with Graph Neural Networks , 2020, ArXiv.

[8]  P. Luo,et al.  TransTrack: Multiple-Object Tracking with Transformer , 2020, ArXiv.

[9]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Silvio Savarese,et al.  JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[12]  Silvio Savarese,et al.  Recurrent Autoregressive Networks for Online Multi-object Tracking , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Junjie Yan,et al.  Multiple Target Tracking Based on Undirected Hierarchical Relation Hypergraph , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Mario Sznaier,et al.  The Way They Move: Tracking Multiple Targets with Similar Appearance , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Konrad Schindler,et al.  Multi-Target Tracking by Discrete-Continuous Energy Minimization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jenq-Neng Hwang,et al.  Anomaly Candidate Identification and Starting Time Estimation of Vehicles from Traffic Videos , 2019, CVPR Workshops.

[17]  Jenq-Neng Hwang,et al.  Tracking Live Fish From Low-Contrast and Low-Frame-Rate Stereo Videos , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Volker Eiselein,et al.  High-Speed tracking-by-detection without using image information , 2017, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[19]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Trevor Darrell,et al.  Joint Monocular 3D Vehicle Detection and Tracking , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Silvio Savarese,et al.  Learning to Track: Online Multi-object Tracking by Decision Making , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Feiyue Huang,et al.  Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking , 2020, ECCV.

[23]  Bodo Rosenhahn,et al.  Lifted Disjoint Paths with Application in Multiple Object Tracking , 2020, ICML.

[24]  Tobias Senst,et al.  Extending IOU Based Multi-Object Tracking by Visual Information , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[25]  Jenq-Neng Hwang,et al.  Efficient Multi-person Hierarchical 3D Pose Estimation for Autonomous Driving , 2019, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[26]  Han Wang,et al.  Multiple Object Tracking With Attention to Appearance, Structure, Motion and Size , 2019, IEEE Access.

[27]  Kris Kitani,et al.  GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking With 2D-3D Multi-Feature Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ming-Hsuan Yang,et al.  Online Multi-object Tracking via Structural Constraint Event Aggregation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jian Wang,et al.  TPM: Multiple object tracking with tracklet-plane matching , 2020, Pattern Recognit..

[30]  Charless C. Fowlkes,et al.  Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[31]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas S. Huang,et al.  Free-Form Image Inpainting With Gated Convolution , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Jenq-Neng Hwang,et al.  CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Philip H. S. Torr,et al.  HOTA: A Higher Order Metric for Evaluating Multi-object Tracking , 2020, International Journal of Computer Vision.

[35]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Zhang Xiong,et al.  Long-Term Tracking With Deep Tracklet Association , 2020, IEEE Transactions on Image Processing.

[38]  Xavier Alameda-Pineda,et al.  How to Train Your Deep Multi-Object Tracker , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Konrad Schindler,et al.  Discrete-continuous optimization for multi-target tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jenq-Neng Hwang,et al.  Tracking Human Under Occlusion Based on Adaptive Multiple Kernels With Projected Gradients , 2013, IEEE Transactions on Multimedia.

[42]  Konrad Schindler,et al.  Continuous Energy Minimization for Multitarget Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Tankut Acarman,et al.  Efficient Multi-Object Tracking by Strong Associations on Temporal Window , 2019, IEEE Transactions on Intelligent Vehicles.

[44]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[45]  Guillaume Charpiat,et al.  Multiple Object Tracking by Efficient Graph Partitioning , 2014, ACCV.

[46]  Haibin Ling,et al.  FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[48]  Bernt Schiele,et al.  Subgraph decomposition for multi-target tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Laura Leal-Taix'e,et al.  Learning a Neural Solver for Multiple Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[51]  Laura Leal-Taixe,et al.  TrackFormer: Multi-Object Tracking with Transformers , 2021, ArXiv.

[52]  Philippe Calvez,et al.  SMAT: Smart Multiple Affinity Metrics for Multiple Object Tracking , 2020, ICIAR.

[53]  Gerhard Rigoll,et al.  Occlusion Handling in Tracking Multiple People Using RNN , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[54]  Charless C. Fowlkes,et al.  Learning Optimal Parameters for Multi-target Tracking with Contextual Interactions , 2016, International Journal of Computer Vision.

[55]  Ming-Hsuan Yang,et al.  UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[56]  Andreas Geiger,et al.  FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Xiansheng Hua,et al.  FGAGT: Flow-Guided Adaptive Graph Tracking , 2020, ArXiv.

[58]  Kuk-Jin Yoon,et al.  Robust Online Multi-object Tracking Based on Tracklet Confidence and Online Discriminative Appearance Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Jianren Wang,et al.  3D Multi-Object Tracking: A Baseline and New Evaluation Metrics , 2019 .

[60]  Nuno Vasconcelos,et al.  Learning Complexity-Aware Cascades for Deep Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Cewu Lu,et al.  TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Jenq-Neng Hwang,et al.  Multi-Camera Tracking of Vehicles based on Deep Features Re-ID and Trajectory-Based Camera Link Models , 2019, CVPR Workshops.

[63]  Gaoang Wang,et al.  Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[64]  Thomas Brox,et al.  A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects , 2016, ArXiv.

[65]  Vladlen Koltun,et al.  Tracking Objects as Points , 2020, ECCV.

[66]  Fabio Poiesi,et al.  Online Multi-target Tracking with Strong and Weak Detections , 2016, ECCV Workshops.