Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection

Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing the four vertices, we glide the vertex of the horizontal bounding box on each corresponding side to accurately describe a multi-oriented object. Specifically, We regress four length ratios characterizing the relative gliding offset on each corresponding side. This may facilitate the offset learning and avoid the confusion issue of sequential label points for oriented objects. To further remedy the confusion issue for nearly horizontal objects, we also introduce an obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object. We add these five extra target variables to the regression head of faster R-CNN, which requires ignorable extra computation time. Extensive experimental results demonstrate that without bells and whistles, the proposed method achieves superior performances on multiple multi-oriented object detection benchmarks including object detection in aerial images, scene text detection, pedestrian detection in fisheye images.

[1]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[3]  Yang Long,et al.  Learning RoI Transformer for Oriented Object Detection in Aerial Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Cheng-Lin Liu,et al.  Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Errui Ding,et al.  Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Dongyoon Han,et al.  Character Region Awareness for Text Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shijian Lu,et al.  CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wei Zhou,et al.  TextField: Learning a Deep Direction Field for Irregular Scene Text Detection , 2018, IEEE Transactions on Image Processing.

[12]  Yue Zhang,et al.  SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[14]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Gangolf Hirtz,et al.  Improved Person Detection on Omnidirectional Images with Non-maxima Supression , 2018, VISIGRAPP.

[16]  Shota Horiguchi,et al.  Omnidirectional Pedestrian Detection by Rotation Invariant Training , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Menglong Yan,et al.  R2CNN++: Multi-Dimensional Attention Based Rotation Invariant Detector with Robust Anchor Strategy , 2018, ArXiv.

[18]  Wenxian Yu,et al.  Toward Arbitrary-Oriented Ship Detection With Rotated Region Proposal and Discrimination Networks , 2018, IEEE Geoscience and Remote Sensing Letters.

[19]  Fei Yin,et al.  Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression , 2018, IEEE Transactions on Image Processing.

[20]  Shijian Lu,et al.  Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping , 2018, ECCV.

[21]  Peter Reinartz,et al.  Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery , 2018, ACCV.

[22]  Gangolf Hirtz,et al.  OmniDetector: With Neural Networks to Bounding Boxes , 2018, ArXiv.

[23]  Weisi Lin,et al.  Learning Markov Clustering Networks for Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[25]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Shuicheng Yan,et al.  Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[28]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[30]  Lei Liu,et al.  Learning a Rotation Invariant Detector with Rotatable Bounding Box , 2017, ArXiv.

[31]  Xiangyu Zhang,et al.  Light-Head R-CNN: In Defense of Two-Stage Object Detector , 2017, ArXiv.

[32]  Yiping Yang,et al.  Rotated region based CNN for ship detection , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[33]  Shijian Lu,et al.  ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[34]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yiping Yang,et al.  Ship Rotated Bounding Box Space for Ship Extraction From High-Resolution Optical Satellite Images With Complex Backgrounds , 2016, IEEE Geoscience and Remote Sensing Letters.

[41]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[42]  A. Vedaldi,et al.  Synthetic Data for Text Localisation in Natural Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[46]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[49]  Wenyu Liu,et al.  A Unified Framework for Multioriented Text Detection and Recognition , 2014, IEEE Transactions on Image Processing.

[50]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[52]  Zhuowen Tu,et al.  Detecting Texts of Arbitrary Orientations in 1 Natural Images , 2012 .

[53]  Pietro Perona,et al.  Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.