Google's Cloud Vision API is Not Robust to Noise

Google has recently introduced the Cloud Vision API for image analysis. According to the demonstration website, the API "quickly classifies images into thousands of categories, detects individual objects and faces within images, and finds and reads printed words contained within images." It can be also used to "detect different types of inappropriate content from adult to violent content." In this paper, we evaluate the robustness of Google Cloud Vision API to input perturbation. In particular, we show that by adding sufficient noise to the image, the API generates completely different outputs for the noisy image, while a human observer would perceive its original content. We show that the attack is consistently successful, by performing extensive experiments on different image types, including natural images, images containing faces and images with texts. For instance, using images from ImageNet dataset, we found that adding an average of 14.25% impulse noise is enough to deceive the API. Our findings indicate the vulnerability of the API in adversarial environments. For example, an adversary can bypass an image filtering system by adding noise to inappropriate images. We then show that when a noise filter is applied on input images, the API generates mostly the same outputs for restored images as for original images. This observation suggests that cloud vision API can readily benefit from noise filtering, without the need for updating image analysis algorithms.

[1]  Radha Poovendran,et al.  Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Gregory Shakhnarovich,et al.  Examining the Impact of Blur on Recognition by Convolutional Networks , 2016, ArXiv.

[3]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[6]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[7]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[8]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[9]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[10]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[11]  Stephen P. Boyd,et al.  Dirty Pixels: Optimizing Image Classification Architectures for Raw Sensor Data , 2017, ArXiv.

[12]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[14]  Radha Poovendran,et al.  Blocking Transferability of Adversarial Examples in Black-Box Learning Systems , 2017, ArXiv.

[15]  Uri Shaham,et al.  Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.

[16]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[17]  Jerry D. Gibson,et al.  Handbook of Image and Video Processing , 2000 .

[18]  Farrokh Marvasti,et al.  Real-Time Impulse Noise Suppression from Images Using an Efficient Weighted-Average Filtering , 2014, IEEE Signal Processing Letters.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  K. Guo,et al.  How does image noise affect actual and predicted human gaze allocation in assessing image quality? , 2015, Vision Research.

[21]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[22]  Radha Poovendran,et al.  Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Amar Mitiche,et al.  Reliable and fast structure-oriented video noise estimation , 2002, Proceedings. International Conference on Image Processing.

[25]  Raymond H. Chan,et al.  Salt-and-pepper noise removal by median-type noise detectors and detail-preserving regularization , 2005, IEEE Transactions on Image Processing.

[26]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Hazim Kemal Ekenel,et al.  How Image Degradations Affect Deep CNN-Based Face Recognition? , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[28]  Lina J. Karam,et al.  Quality Resilient Deep Neural Networks , 2017, ArXiv.