OmniDetector: With Neural Networks to Bounding Boxes

We propose a person detector on omnidirectional images, an accurate method to generate minimal enclosing rectangles of persons. The basic idea is to adapt the qualitative detection performance of a convolutional neural network based method, namely YOLOv2 to fish-eye images. The design of our approach picks up the idea of a state-of-the-art object detector and highly overlapping areas of images with their regions of interests. This overlap reduces the number of false negatives. Based on the raw bounding boxes of the detector we fine-tuned overlapping bounding boxes by three approaches. The non-maximum suppression, the soft non-maximum suppression and the soft non-maximum suppression with Gaussian smoothing. The evaluation was done on the PIROPO database, supplemented with bounding boxes on omnidirectional images. We achieve an average precision of 64.4 % with YOLOv2 for the class person. For this purpose we fine-tuned the soft non-maximum suppression with Gaussian smoothing.

[1]  Max Welling,et al.  Spherical CNNs , 2018, ICLR.

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yalin Bastanlar,et al.  A Direct approach for human detection with catadioptric omnidirectional cameras , 2014, 2014 22nd Signal Processing and Communications Applications Conference (SIU).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[12]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Gangolf Hirtz,et al.  A fast approach for omnidirectional surveillance with multiple virtual perspective views , 2013, Eurocon 2013.

[16]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[17]  Wei Li,et al.  WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.