A Weakly-Supervised Approach for Discovering Common Objects in Airport Video Surveillance Footage

Object detection in video is a relevant task in computer vision. Standard and current detectors are typically trained in a strongly supervised way, what requires a huge amount of labelled data. In contrast, in this paper we focus on object discovery in video sequences by using sets of unlabelled data. Thus, we present an approach based on the use of two region proposal algorithms (a pretrained Region Proposal Network and an Optical Flow Proposal) to produce regions of interest that will be grouped using a clustering algorithm. Therefore, our system does not require the collaboration of a human except for assigning human understandable labels to the discovered clusters. We evaluate our approach in a set of videos recorded at apron area, where the aeroplanes park to load passengers and luggage. Our experimental results suggest that the use of an unsupervised approach is valid for automatic object discovery in video sequences, obtaining a CorLoc of 86.8 and a mAP of 0.374 compared to a CorLoc of 70.4 and mAP of 0.683 achieved by a supervised Faster R-CNN trained and tested on the same dataset.

[1]  Bastian Leibe,et al.  Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video , 2017, ArXiv.

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Yeong Jun Koh,et al.  Unsupervised Primary Object Discovery in Videos Based on Evolutionary Primary Object Modeling With Reliable Object Proposals. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[4]  Tao Xiang,et al.  Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[6]  Matthieu Cord,et al.  WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Guillermo Sapiro,et al.  Self-Learning Scene-Specific Pedestrian Detectors Using a Progressive Latent Model , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Junwei Han,et al.  SPFTN: A Joint Learning Framework for Localizing and Segmenting Objects in Weakly Labeled Videos , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jana Kosecka,et al.  Self-supervisory Signals for Object Discovery and Detection , 2018, ArXiv.

[12]  Yao Li,et al.  Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution , 2016, ECCV.

[13]  Jeffrey Mark Siskind,et al.  Sentence Directed Video Object Codiscovery , 2017, International Journal of Computer Vision.

[14]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[15]  Keiichi Abe,et al.  Topological structural analysis of digitized binary images by border following , 1985, Comput. Vis. Graph. Image Process..

[16]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jean Ponce,et al.  Unsupervised Object Discovery and Tracking in Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Nanning Zheng,et al.  Video Object Discovery and Co-Segmentation with Extremely Weak Supervision , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[23]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[24]  Ivan Laptev,et al.  Weakly-Supervised Learning of Visual Relations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Patrick Pérez,et al.  Unsupervised Image Matching and Object Discovery as Optimization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[28]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[30]  Ivan Laptev,et al.  ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[31]  Thomas Deselaers,et al.  Localizing Objects While Learning Their Appearance , 2010, ECCV.

[32]  Yong Xu,et al.  Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).