Fast and Accurate Quantized Camera Scene Detection on Smartphones, Mobile AI 2021 Challenge: Report

Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms. For this, the participants were provided with a large-scale CamSDD dataset consisting of more than 11K images belonging to the 30 most important scene categories. The runtime of all models was evaluated on the popular Apple Bionic A11 platform that can be found in many iOS devices. The proposed solutions are fully compatible with all major mobile AI accelerators and can demonstrate more than 100-200 FPS on the majority of recent smartphone platforms while achieving a top-3 accuracy of more than 98%. A detailed description of all models developed in the challenge is provided in this paper.

[1]  L. Gool,et al.  T-Basis: a Compact Representation for Neural Networks , 2020, ICML.

[2]  Houqiang Li,et al.  Quantization Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jie Li,et al.  AIM 2019 Challenge on RAW to RGB Mapping: Methods and Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[4]  Fahad Shahbaz Khan,et al.  NTIRE 2019 Challenge on Image Enhancement: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Kurt Keutzer,et al.  ZeroQ: A Novel Zero Shot Quantization Framework , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Radu Timofte,et al.  AIM 2020 Challenge on Learned Image Signal Processing Pipeline , 2020, ECCV Workshops.

[7]  Lucas Beyer,et al.  Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[8]  Luc Van Gool,et al.  PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report , 2018, ECCV Workshops.

[9]  Radu Timofte,et al.  AIM 2020 Challenge on Rendering Realistic Bokeh , 2020, ECCV Workshops.

[10]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Albert Gural,et al.  Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware , 2019, ArXiv.

[12]  Radu Timofte,et al.  Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Radu Timofte,et al.  Rendering Natural Camera Bokeh Effect with Deep Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Dong-Wook Kim,et al.  NTIRE 2019 Challenge on Real Image Denoising: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Xiaoling Zhang,et al.  NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17]  Luc Van Gool,et al.  NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  Yuandong Tian,et al.  FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Radu Timofte,et al.  Real-Time Video Super-Resolution on Smartphones with Deep Learning, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Luc Van Gool,et al.  WESPE: Weakly Supervised Photo Enhancer for Digital Cameras , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Hanseok Ko,et al.  NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Luc Van Gool,et al.  AI Benchmark: All About Deep Learning on Smartphones in 2019 , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[25]  Luc Van Gool,et al.  Learning Filter Basis for Convolutional Neural Network Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[27]  Radu Timofte,et al.  Real-Time Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[29]  Radu Timofte,et al.  Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Radu Timofte,et al.  Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[32]  Andrey Ignatov,et al.  Controlling Information Capacity of Binary Neural Network , 2020, Pattern Recognit. Lett..

[33]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Fabien Cardinaux,et al.  Mixed Precision DNNs: All you need is a good parametrization , 2019, ICLR.

[35]  Radu Timofte,et al.  AIM 2019 Challenge on Bokeh Effect Synthesis: Methods and Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[36]  Radu Timofte,et al.  Fast and Accurate Camera Scene Detection on Smartphones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Ke Wang,et al.  AI Benchmark: Running Deep Neural Networks on Android Smartphones , 2018, ECCV Workshops.

[39]  Koan-Sin Tan,et al.  Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Chen Hong,et al.  NTIRE 2019 Challenge on Real Image Super-Resolution: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Xin Xia,et al.  HNAS: Hierarchical Neural Architecture Search on Mobile Devices , 2020, ArXiv.

[43]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).