Backward Compatible Object Detection Using HDR Image Content

Convolution Neural Network (CNN)-based object detection models have achieved unprecedented accuracy in challenging detection tasks. However, existing detection models (detection heads) trained on 8-bits/pixel/channel low dynamic range (LDR) images are unable to detect relevant objects under lighting conditions where a portion of the image is either under-exposed or over-exposed. Although this issue can be addressed by introducing High Dynamic Range (HDR) content and training existing detection heads on HDR content, there are several major challenges, such as the lack of real-life annotated HDR dataset(s) and extensive computational resources required for training and the hyper-parameter search. In this paper, we introduce an alternative backwards-compatible methodology to detect objects in challenging lighting conditions using existing CNN-based detection heads. This approach facilitates the use of HDR imaging without the immediate need for creating annotated HDR datasets and the associated expensive retraining procedure. The proposed approach uses HDR imaging to capture relevant details in high contrast scenarios. Subsequently, the scene dynamic range and wider colour gamut are compressed using HDR to LDR mapping techniques such that the salient highlight, shadow, and chroma details are preserved. The mapped LDR image can then be used by existing pre-trained models to extract relevant features required to detect objects in both the under-exposed and over-exposed regions of a scene. In addition, we also conduct an evaluation to study the feasibility of using existing HDR to LDR mapping techniques with existing detection heads trained on standard detection datasets such as PASCAL VOC and MSCOCO. Results show that the images obtained from the mapping techniques are suitable for object detection, and some of them can significantly outperform traditional LDR images.

[1]  Jack Tumblin,et al.  Tone Reproduction for Realistic Computer Generated Images , 1991 .

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[4]  Donald P. Greenberg,et al.  Perceptual color spaces for computer graphics , 1980, SIGGRAPH '80.

[5]  Jan Kautz,et al.  Exposure Fusion , 2007, 15th Pacific Conference on Computer Graphics and Applications (PG'07).

[6]  Xiaogang Jin,et al.  Real-Time Tone Mapping for High-Resolution HDR Images , 2008, 2008 International Conference on Cyberworlds.

[7]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[8]  John K. Tsotsos,et al.  50 Years of object recognition: Directions forward , 2013, Comput. Vis. Image Underst..

[9]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[10]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[13]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[14]  Mark D. Fairchild,et al.  Meet iCAM: A Next-Generation Color Appearance Model , 2002, Color Imaging Conference.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Erik Reinhard,et al.  Photographic tone reproduction for digital images , 2002, ACM Trans. Graph..

[17]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[18]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[19]  Donald P. Greenberg,et al.  A model of visual adaptation for realistic image synthesis , 1996, SIGGRAPH.

[20]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[21]  Kurt Debattista,et al.  Advanced High Dynamic Range Imaging: Theory and Practice , 2011 .

[22]  Michael Wimmer,et al.  Evaluation of HDR tone mapping methods using essential perceptual attributes , 2008, Comput. Graph..

[23]  Xiangyu Zhang,et al.  Light-Head R-CNN: In Defense of Two-Stage Object Detector , 2017, ArXiv.

[24]  Yuxing Peng,et al.  ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[26]  Dani Lischinski,et al.  Gradient Domain High Dynamic Range Compression , 2023 .

[27]  Rafal Mantiuk,et al.  Display adaptive tone mapping , 2008, ACM Trans. Graph..

[28]  Jieping Ye,et al.  Object Detection in 20 Years: A Survey , 2019, Proceedings of the IEEE.

[29]  Greg Ward,et al.  A Contrast-Based Scalefactor for Luminance Display , 1994, Graphics Gems.

[30]  Kurt Debattista,et al.  Evaluation of Tone‐Mapping Operators for HDR Video Under Different Ambient Luminance Levels , 2015, Comput. Graph. Forum.

[31]  M. Levent Eksert,et al.  An evaluation of image reproduction algorithms for high contrast scenes on large and small screen display devices , 2013, Comput. Graph..

[32]  Donald P. Greenberg,et al.  Time-dependent visual adaptation for fast realistic image display , 2000, SIGGRAPH.

[33]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Kurt Debattista,et al.  High-dynamic-range video solution , 2009, SIGGRAPH ASIA '09.

[36]  Alexei A. Efros,et al.  Fast bilateral filtering for the display of high-dynamic-range images , 2002 .

[37]  Holly E. Rushmeier,et al.  Tone reproduction for realistic images , 1993, IEEE Computer Graphics and Applications.

[38]  R. Hunt An Improved Predictor of Colourfulness in a Model of Colour Vision , 1994, Color Research & Application.

[39]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Mark D. Fairchild,et al.  iCAM06: A refined image appearance model for HDR image rendering , 2007, J. Vis. Commun. Image Represent..

[41]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[42]  Tetsuya Asai,et al.  Real-time Tone Mapping: A State of the Art Report , 2020, 2003.03074.

[43]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[44]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[46]  Michael Ashikhmin,et al.  A Tone Mapping Algorithm for High Contrast Images , 2002, Rendering Techniques.

[47]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Cesare Pautasso,et al.  Mean Average Precision , 2009, Encyclopedia of Database Systems.

[49]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.