Online self-supervised multi-instance segmentation of dynamic objects

This paper presents a method for the continuous segmentation of dynamic objects using only a vehicle mounted monocular camera without any prior knowledge of the object's appearance. Prior work in online static/dynamic segmentation [1] is extended to identify multiple instances of dynamic objects by introducing an unsupervised motion clustering step. These clusters are then used to update a multi-class classifier within a self-supervised framework. In contrast to many tracking-by-detection based methods, our system is able to detect dynamic objects without any prior knowledge of their visual appearance shape or location. Furthermore, the classifier is used to propagate labels of the same object in previous frames, which facilitates the continuous tracking of individual objects based on motion. The proposed system is evaluated using recall and false alarm metrics in addition to a new multi-instance labelled dataset to measure the performance of segmenting multiple instances of objects.

[1]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Roland Miezianko,et al.  Dictionary learning for robust background modeling , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3]  Pietro Perona,et al.  Pedestrian detection: A benchmark , 2009, CVPR.

[4]  Michel Devy,et al.  Active Method for Mobile Object Detection from an Embedded Camera, Based on a Contrario Clustering , 2010, ICINCO.

[5]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[6]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  R. Real,et al.  The Probabilistic Basis of Jaccard's Index of Similarity , 1996 .

[9]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[11]  Fabio Tozeto Ramos,et al.  Online self-supervised segmentation of dynamic objects , 2013, 2013 IEEE International Conference on Robotics and Automation.

[12]  Somkiat Wangsiripitak,et al.  Avoiding moving outliers in visual SLAM by tracking moving objects , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[14]  Thierry Bouwmans,et al.  Recent Advanced Statistical Background Modeling for Foreground Detection - A Systematic Survey , 2011 .

[15]  Takeo Kato,et al.  Vehicle Ego-Motion Estimation and Moving Object Detection using a Monocular Camera , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[16]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[17]  Frank Dellaert,et al.  GroupSAC: Efficient consensus in the presence of groupings , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Regis Hoffman,et al.  Visual classification of coarse vehicle orientation using Histogram of Oriented Gradients features , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[19]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Antoine Manzanera,et al.  Real Time Semi-dense Point Tracking , 2012, ICIAR.

[21]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[22]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[23]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[24]  Cewu Lu,et al.  Online Robust Dictionary Learning , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Christoph Stiller,et al.  Moving on to dynamic environments: Visual odometry using feature classification , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Luc Van Gool,et al.  Moving obstacle detection in highly dynamic scenes , 2009, 2009 IEEE International Conference on Robotics and Automation.

[27]  Gert Cauwenberghs,et al.  SVM incremental learning, adaptation and optimization , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[28]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[29]  Vassilios Morellas,et al.  Robust Foreground Detection In Video Using Pixel Layers , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.