MOANA: An Online Learned Adaptive Appearance Model for Robust Multiple Object Tracking in 3D

Multiple object tracking has been a challenging field, mainly due to noisy detection sets an identity switch caused by occlusion and similar appearance among nearby targets. Previous works rely on appearance models that are built on an individual or several selected frames for the comparison of features, but they cannot encode the long-term appearance changes caused by pose, viewing angle, and lighting conditions. In this paper, we propose an adaptive model that learns online a relatively long-term appearance change of each target. The proposed model is compatible with any feature of fixed dimension or their combination, whose learning rates are dynamically controlled by the adaptive update and spatial weighting schemes. To handle occlusion and nearby objects that are sharing a similar appearance, we also design the cross-matching and re-identification schemes based on the application of the proposed adaptive appearance models. In addition, the 3D geometry information is effectively incorporated in our formulation for data association. The proposed method outperforms all the state of the art on the MOTChallenge 3D benchmark and achieves real-time computation with only a standard desktop CPU. It has also shown superior performance over the state of the art on the 2D benchmark of MOTChallenge.

[1]  Luc Van Gool,et al.  Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[3]  Guillaume-Alexandre Bilodeau,et al.  Flexible Background Subtraction with Self-Balanced Local Sensitivity , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[4]  King Ngi Ngan,et al.  Unsupervised extraction of visual attention objects in color images , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Nenghai Yu,et al.  Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Jun Guo,et al.  Short Utterance Based Speech Language Identification in Intelligent Vehicles With Time-Scale Modifications and Deep Bottleneck Features , 2019, IEEE Transactions on Vehicular Technology.

[7]  Guillaume-Alexandre Bilodeau,et al.  SuBSENSE: A Universal Change Detection Method With Local Adaptive Sensitivity , 2015, IEEE Transactions on Image Processing.

[8]  Xiang Ji,et al.  Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space , 2013, IEEE Transactions on Image Processing.

[9]  Bodo Rosenhahn,et al.  Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[10]  Zhenmin Tang,et al.  Exploiting textual and visual features for image categorization , 2019, Pattern Recognit. Lett..

[11]  Ramakant Nevatia,et al.  Online Learned Discriminative Part-Based Appearance Models for Multi-human Tracking , 2012, ECCV.

[12]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[13]  Jenq-Neng Hwang,et al.  On-Road Pedestrian Tracking Across Multiple Driving Recorders , 2015, IEEE Transactions on Multimedia.

[14]  Feng Wu,et al.  Background Prior-Based Salient Object Detection via Deep Reconstruction Residual , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[16]  Jenq-Neng Hwang,et al.  Camera self-calibration from tracking of moving persons , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[17]  Jun Guo,et al.  Variational Bayesian Learning for Dirichlet Process Mixture of Inverted Dirichlet Distributions in Non-Gaussian Image Feature Modeling , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Thomas Brox,et al.  A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects , 2016, ArXiv.

[19]  Jenq-Neng Hwang,et al.  Joint Multi-View People Tracking and Pose Estimation for 3D Scene Reconstruction , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[20]  Gerhard Rigoll,et al.  Background segmentation with feedback: The Pixel-Based Adaptive Segmenter , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Bohyung Han,et al.  Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Junwei Han,et al.  SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention , 2019, IEEE Transactions on Image Processing.

[23]  Jenq-Neng Hwang,et al.  Multiple-kernel adaptive segmentation and tracking (MAST) for robust object tracking , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Christian Heipke,et al.  PROBABILISTIC MULTI-PERSON TRACKING USING DYNAMIC BAYES NETWORKS , 2015 .

[25]  Min Yang,et al.  A Hybrid Data Association Framework for Robust Online Multi-Object Tracking , 2017, IEEE Transactions on Image Processing.

[26]  Bin Yang,et al.  Occlusion-robust object tracking based on the confidence of online selected hierarchical features , 2018, IET Image Process..

[27]  Jie Cao,et al.  Dual Cross-Entropy Loss for Small-Sample Fine-Grained Vehicle Classification , 2019, IEEE Transactions on Vehicular Technology.

[28]  Jen-Tzung Chien,et al.  Image-text dual neural network with decision strategy for small-sample image classification , 2019, Neurocomputing.

[29]  Jenq-Neng Hwang,et al.  Tracking Human Under Occlusion Based on Adaptive Multiple Kernels With Projected Gradients , 2013, IEEE Transactions on Multimedia.

[30]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Jenq-Neng Hwang,et al.  Human tracking by adaptive Kalman filtering and multiple kernels tracking with projected gradients , 2011, 2011 Fifth ACM/IEEE International Conference on Distributed Smart Cameras.

[32]  Christian Heipke,et al.  Probabilistic multi-person localisation and tracking in image sequences , 2017 .

[33]  Zhen Lei,et al.  Multi-Camera Multi-Target Tracking with Space-Time-View Hyper-graph , 2017, International Journal of Computer Vision.

[34]  Ramakant Nevatia,et al.  Multi-target tracking by on-line learned discriminative appearance models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Jenq-Neng Hwang,et al.  ESTHER: Joint Camera Self-Calibration and Automatic Radial Distortion Correction From Tracking of Walking Humans , 2019, IEEE Access.

[37]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Dietrich Paulus,et al.  Global data association for the Probability Hypothesis Density filter using network flows , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[40]  Stefan Roth,et al.  MOT16: A Benchmark for Multi-Object Tracking , 2016, ArXiv.

[41]  Shai Avidan,et al.  Real-time tracking-with-detection for coping with viewpoint change , 2015, Machine Vision and Applications.

[42]  Silvio Savarese,et al.  Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Jenq-Neng Hwang,et al.  Single-Camera and Inter-Camera Vehicle Tracking and 3D Speed Estimation Based on Fusion of Visual and Semantic Features , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[45]  Lei Guo,et al.  Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Xuelong Li,et al.  Detection of Co-salient Objects by Looking Deep and Wide , 2016, International Journal of Computer Vision.

[47]  Xiaohui Yuan,et al.  Inverse Sparse Group Lasso Model for Robust Object Tracking , 2017, IEEE Transactions on Multimedia.

[48]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49]  Deyu Meng,et al.  Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Ramakant Nevatia,et al.  Multi-target tracking by online learning of non-linear motion patterns and robust appearance models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Long Chen,et al.  Online multi-object tracking with convolutional neural networks , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[52]  Xiaogang Wang,et al.  Deep Continuous Conditional Random Fields With Asymmetric Inter-Object Constraints for Online Multi-Object Tracking , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[53]  Bing Wang,et al.  Tracklet Association by Online Target-Specific Metric Learning and Coherent Dynamics Estimation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.