Identifying Unknown Instances for Autonomous Driving

In the past few years, we have seen great progress in perception algorithms, particular through the use of deep learning. However, most existing approaches focus on a few categories of interest, which represent only a small fraction of the potential categories that robots need to handle in the real-world. Thus, identifying objects from unknown classes remains a challenging yet crucial task. In this paper, we develop a novel open-set instance segmentation algorithm for point clouds which can segment objects from both known and unknown classes in a holistic way. Our method uses a deep convolutional neural network to project points into a category-agnostic embedding space in which they can be clustered into instances irrespective of their semantics. Experiments on two large-scale self-driving datasets validate the effectiveness of our proposed method.

[1]  Bastian Leibe,et al.  4D Generic Video Object Proposals , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Richard S. Zemel,et al.  End-to-End Instance Segmentation with Recurrent Attention , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  D. Levitin The foundations of cognitive psychology: Core readings , 2005, History & Philosophy of Psychology.

[5]  Xinlei Chen,et al.  TensorMask: A Foundation for Dense Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Trevor Darrell,et al.  Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Luc Van Gool,et al.  Semantic Instance Segmentation with a Discriminative Loss Function , 2017, ArXiv.

[8]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Bastian Leibe,et al.  3D-BEVIS: Birds-Eye-View Instance Segmentation , 2019, GCPR.

[10]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[11]  Henrik I. Christensen,et al.  Efficient Organized Point Cloud Segmentation with Connected Components , 2013 .

[12]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[13]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Marta Coelho Antunes,et al.  The novel object recognition memory: neurobiology, test procedure, and its modifications , 2011, Cognitive Processing.

[15]  J. Delacour,et al.  A new one-trial test for neurobiological studies of memory in rats. 1: Behavioral data , 1988, Behavioural Brain Research.

[16]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[17]  Ulrich Neumann,et al.  SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Bin Yang,et al.  HDNET: Exploiting HD Maps for 3D Object Detection , 2018, CoRL.

[19]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shu Kong,et al.  Recurrent Pixel Embedding for Instance Grouping , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Duc Thanh Nguyen,et al.  JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds With Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[24]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ashish Sharma,et al.  An Enhanced Density Based Spatial Clustering of Applications with Noise , 2009, 2009 IEEE International Advance Computing Conference.

[27]  Luc Van Gool,et al.  Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Min Bai,et al.  Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[31]  Gustavo Carneiro,et al.  Bayesian Semantic Instance Segmentation in Open Set World , 2018, ECCV.

[32]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Benmouiza Khalil,et al.  Density-based spatial clustering of application with noise algorithm for the classification of solar radiation time series , 2016, 2016 8th International Conference on Modelling, Identification and Control (ICMIC).

[34]  H. Barlow Vision Science: Photons to Phenomenology by Stephen E. Palmer , 2000, Trends in Cognitive Sciences.

[35]  Min Bai,et al.  UPSNet: A Unified Panoptic Segmentation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sanja Fidler,et al.  SGN: Sequential Grouping Networks for Instance Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[39]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[40]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Joshua B. Tenenbaum,et al.  Meta-Learning for Semi-Supervised Few-Shot Classification , 2018, ICLR.

[42]  Yunchao Wei,et al.  Proposal-Free Network for Instance-Level Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Silvio Savarese,et al.  SEGCloud: Semantic Segmentation of 3D Point Clouds , 2017, 2017 International Conference on 3D Vision (3DV).

[46]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[47]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[48]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[49]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[50]  Chen Liu,et al.  MASC: Multi-scale Affinity with Sparse Convolution for 3D Instance Segmentation , 2019, ArXiv.

[51]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[52]  Peng Wang,et al.  Semantic Instance Segmentation via Deep Metric Learning , 2017, ArXiv.

[53]  Raquel Urtasun,et al.  Efficient Convolutions for Real-Time Semantic Segmentation of 3D Point Clouds , 2018, 2018 International Conference on 3D Vision (3DV).