ISP Distillation

Nowadays, many of the images captured are ”observed” by machines only and not by humans, for example, robots’ or autonomous cars’ cameras. High-level machine vision models, such as object recognition or semantic segmentation, assume images are transformed to some canonical image space by the camera ISP. However, the camera ISP is optimized for producing visually pleasing images to human observers and not for machines, thus, one may spare the ISP compute time and apply the vision models directly on the raw data. Yet, it has been shown that training such models directly on the RAW images results in a performance drop. To mitigate this drop in performance (without the need to annotate RAW data), we use a dataset of RAW and RGB image pairs, which can be easily acquired with no human labeling. We then train a model that is applied directly on the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images. Our experiments show that our performance on RAW images for object classification and semantic segmentation are significantly better than a model trained on labeled RAW images. It also reasonably matches the predictions of a pre-trained model on processed RGB images, while saving the ISP compute overhead.

[1]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xiaojun Lin,et al.  Student-Teacher Learning from Clean Inputs to Noisy Inputs , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Vladlen Koltun,et al.  Dynamic Low-light Imaging with Quanta Image Sensors , 2020, ECCV.

[4]  Stanley H. Chan,et al.  Image Classification in the Dark using Quanta Image Sensors , 2020, ECCV.

[5]  Zhiwei Xiong,et al.  Deep Degradation Prior for Low-Quality Image Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  P. Whatmough,et al.  ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems , 2019, ArXiv.

[7]  Masanori Hashimoto,et al.  Distilling Knowledge for Non-Neural Networks , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[8]  Jinhui Tang,et al.  Few-Shot Image Recognition With Knowledge Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Senthil Yogamani,et al.  Overview and Empirical Analysis of ISP Parameter Tuning for Visual Perception in Autonomous Driving , 2019, J. Imaging.

[10]  Leo F. Isikdogan,et al.  VisionISP: Repurposing the Image Signal Processor for Computer Vision Applications , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[11]  Raja Giryes,et al.  DeepISP: Toward Learning an End-to-End Image Processing Pipeline , 2018, IEEE Transactions on Image Processing.

[12]  Ali Farhadi,et al.  Label Refinery: Improving ImageNet Classification through Label Progression , 2018, ArXiv.

[13]  Jia Xu,et al.  Learning to See in the Dark , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Michael S. Brown,et al.  Classification-Driven Dynamic Image Enhancement , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[18]  Suren Jayasuriya,et al.  Reconfiguring the Imaging Pipeline for Computer Vision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Stephen P. Boyd,et al.  Dirty Pixels: Optimizing Image Classification Architectures for Raw Sensor Data , 2017, ArXiv.

[20]  Jonathan T. Barron,et al.  Burst photography for high dynamic range and low-light imaging on mobile cameras , 2016, ACM Trans. Graph..

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[25]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[26]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27]  Rich Caruana,et al.  Model compression , 2006, KDD '06.