Oriented Spatial Transformer Network for Pedestrian Detection Using Fish-Eye Camera

Pedestrian detection using fish-eye cameras is a principal research focus in computer vision. Lack of pedestrian datasets of fish-eye images and pedestrian distortion in fish-eye images are two primary challenges. In this paper, two approaches are proposed to deal with these two challenges, respectively. On the one hand, the projective model transformation (PMT) algorithm is proposed, which can transform normal images into fish-eye images. The PMT can be applied to most of the pedestrian datasets and generates corresponding fish-eye image datasets. In this way, enough training data can be provided through the PMT. On the other hand, the oriented spatial transformer network (OSTN) is designed to rectify warped pedestrian features using CNNs, so that pedestrians in fish-eye images are easier for detectors to recognize. The OSTN can be embedded into universal deep learning based detectors easily. Moreover, the new pedestrian detector, where the OSTN is embedded, can be trained end to end. Finally, the OSTN based fish-eye pedestrian detectors can be trained using fish-eye images, which are generated using the PMT. Experiments on ETH, KITTI, Citypersons, and real pedestrian datasets show the effectiveness of the PMT and accuracy improvement of pedestrian detection in fish-eye images using the OSTN.

[1]  Philippe Martinet,et al.  A generic fisheye camera model for robotic applications , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Dan Levi,et al.  Tracking and Motion Cues for Rear-View Pedestrian Detection , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[3]  Hui Zhou,et al.  Pedestrian Detection via Body Part Semantic and Contextual Information With DNN , 2018, IEEE Transactions on Multimedia.

[4]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Wolfgang Schulz,et al.  Pedestrian Recognition from a Moving Catadioptric Camera , 2007, DAGM-Symposium.

[6]  Marc Pollefeys,et al.  CamOdoCal: Automatic intrinsic and extrinsic calibration of a rig with multiple generic cameras and odometry , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[8]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Vincent Frémont,et al.  Vision-Based People Detection System for Heavy Machine Applications , 2016, Sensors.

[10]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[11]  Davide Scaramuzza,et al.  Benefit of large field-of-view cameras for visual odometry , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Alberto Broggi,et al.  Vehicle detection for autonomous parking using a Soft-Cascade AdaBoost classifier , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[13]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[14]  Vincent Frémont,et al.  Deformable parts model for people detection in heavy machines applications , 2014, 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV).

[15]  Massimo Bertozzi,et al.  360° Detection and tracking algorithm of both pedestrian and vehicle using fisheye images , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[16]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yuxing Tang,et al.  Weakly Supervised Learning of Deformable Part-Based Models for Object Detection via Region Proposals , 2017, IEEE Transactions on Multimedia.

[18]  Yohan Dupuis,et al.  A direct approach for face detection on omnidirectional images , 2011, 2011 IEEE International Symposium on Robotic and Sensors Environments (ROSE).

[19]  A. Makadia,et al.  Image processing in catadioptric planes: spatiotemporal derivatives and optical flow computation , 2002, Proceedings of the IEEE Workshop on Omnidirectional Vision 2002. Held in conjunction with ECCV'02.

[20]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[21]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Ho Gi Jung,et al.  Rearview Camera-Based Backover Warning System Exploiting a Combination of Pose-Specific Pedestrian Recognitions , 2018, IEEE Transactions on Intelligent Transportation Systems.

[25]  Ming Yang,et al.  CNN based semantic segmentation for urban traffic scenes using fisheye camera , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[26]  Ming Yang,et al.  Self-adapting part-based pedestrian detection using a fish-eye camera , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[27]  Supun Samarasekera,et al.  Long-Range Pedestrian Detection using stereo and a cascade of convolutional network classifiers , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Thomas Bülow,et al.  Spherical Diffusion for 3D Surface Smoothing , 2004, 3DPVT.

[29]  Günther Palm,et al.  Surround view pedestrian detection using heterogeneous classifier cascades , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[30]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ming Wang,et al.  Joint Coding-Transmission Optimization for a Video Surveillance System With Multiple Cameras , 2018, IEEE Transactions on Multimedia.

[32]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[33]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Juho Kannala,et al.  A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ming Yang,et al.  Stixel World Based Long-Term Object Tracking for Intelligent Driving , 2016, ICCSIP.