Online self-supervised segmentation of dynamic objects

We address the problem of automatically segmenting dynamic objects in an urban environment from a moving camera without manual labelling, in an online, self-supervised learning manner. We use input images obtained from a single uncalibrated camera placed on top of a moving vehicle, extracting and matching pairs of sparse features that represent the optical flow information between frames. This optical flow information is initially divided into two classes, static or dynamic, where the static class represents features that comply to the constraints provided by the camera motion and the dynamic class represents the ones that do not. This initial classification is used to incrementally train a Gaussian Process (GP) classifier to segment dynamic objects in new images. The hyperparameters of the GP covariance function are optimized online during navigation, and the available self-supervised dataset is updated as new relevant data is added and redundant data is removed, resulting in a near-constant computing time even after long periods of navigation. The output is a vector containing the probability that each pixel in the image belongs to either the static or dynamic class (ranging from 0 to 1), along with the corresponding uncertainty estimate of the classification. Experiments conducted in an urban environment, with cars and pedestrians as dynamic objects and no prior knowledge or additional sensors, show promising results even when the vehicle is moving at considerable speeds (up to 50 km/h). This scenario produces a large quantity of featureless regions and false matches that is very challenging for conventional approaches. Results obtained using a portable camera device also testify to our algorithm's ability to generalize over different environments and configurations without any fine-tuning of parameters.

[1]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[2]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[3]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[4]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[5]  Yaser Sheikh,et al.  Bayesian modeling of dynamic scenes for object detection , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ingemar J. Cox,et al.  A review of statistical data association techniques for motion correspondence , 1993, International Journal of Computer Vision.

[7]  G. Wahba Spline models for observational data , 1990 .

[8]  Luc Van Gool,et al.  Depth and Appearance for Mobile Scene Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Sarvapali D. Ramchurn,et al.  2008 International Conference on Information Processing in Sensor Networks Towards Real-Time Information Processing of Sensor Network Data using Computationally Efficient Multi-output Gaussian Processes , 2022 .

[11]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[12]  Takeo Kato,et al.  Pedestrian Detection with Stereo Vision , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[13]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[14]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Guoying Zhao,et al.  Machine Learning for Vision-Based Motion Analysis: Theory and Techniques , 2010 .

[16]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[17]  Mario A. Ibarra-Manzano,et al.  3D Visual Information for Dynamic Objects Detection and Tracking During Mobile Robot Navigation , 2011 .

[18]  Fabio Tozeto Ramos,et al.  Semi-parametric models for visual odometry , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  James R. Bergen,et al.  Visual odometry for ground vehicle applications , 2006, J. Field Robotics.

[20]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[21]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[22]  Gert Kootstra,et al.  International Conference on Robotics and Automation (ICRA) , 2008, ICRA 2008.

[23]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[24]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[25]  James W. Davis,et al.  Tracking mean shift clustered point clouds for 3D surveillance , 2006, VSSN '06.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.