Driving Scene Retrieval by Example from Large-Scale Data

Many machine learning approaches train networks with input from large datasets to reach high task performance. Collected datasets, such as Berkeley Deep Drive Video (BDD-V) for autonomous driving, contain a large variety of scenes and hence features. However, depending on the task, subsets, containing certain features more densely, support training better than others. For example, training networks on tasks such as image segmentation, bounding box detection or tracking requires an ample amount of objects in the input data. When training a network to perform optical flow estimation from first-person video, over-proportionally many straight driving scenes in the training data may lower generalization to turns. Even though some scenes of the BDD-V dataset are labeled with scene, weather or time of day information, these may be too coarse to filter the dataset best for a particular training task. Furthermore, even defining an exhaustive list of good label-types is complicated as it requires choosing the most relevant concepts of the natural world for a task. Alternatively, we investigate how to use examples of desired data to retrieve more similar data from a large-scale dataset. Following the paradigm of ”I know it when I see it”, we present a deep learning approach to use driving examples for retrieving similar scenes from the BDD-V dataset. Our method leverages only automatically collected labels. We show how we can reliably vary time of the day or objects in our query examples and retrieve nearest neighbors from the dataset. Using this method, already collected data can be filtered to remove bias from a dataset, removing scenes regarded too redundant to train on.

[1]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kavita Bala,et al.  Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[4]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[5]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Alexei A. Efros,et al.  Improving Generalization via Scalable Neighborhood Component Analysis , 2018, ECCV.