Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps

On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? How much can (stolen) models cost? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.

[1]  Sujatha Perera,et al.  Customer-focused manufacturing strategy and the use of operations-based non-financial performance measures: A research note , 1997 .

[2]  Petar Tsankov,et al.  Statistical Deobfuscation of Android Applications , 2016, CCS.

[3]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Shweta Shinde,et al.  Privado: Practical and Secure DNN Inference , 2018, ArXiv.

[5]  Juhyun Lee,et al.  On-Device Neural Net Inference with Mobile GPUs , 2019, ArXiv.

[6]  Wei You,et al.  Mass Discovery of Android Traffic Imprints through Instantiated Partial Execution , 2017, CCS.

[7]  Hui Wu,et al.  Protecting Intellectual Property of Deep Neural Networks with Watermarking , 2018, AsiaCCS.

[8]  Eric Bodden,et al.  Harvesting Runtime Values in Android Applications That Feature Anti-Analysis Techniques , 2016, NDSS.

[9]  Xuanzhe Liu,et al.  A First Look at Deep Learning Apps on Smartphones , 2018, WWW.

[10]  Shweta Shinde,et al.  Privado: Practical and Secure DNN Inference with Enclaves , 2018 .

[11]  Dan Boneh,et al.  Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware , 2018, ICLR.

[12]  Mehmed M. Kantardzic,et al.  Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains , 2017, Neurocomputing.

[13]  Anca D. Dragan,et al.  Model Reconstruction from Model Explanations , 2018, FAT.

[14]  Rodrigo Bruno,et al.  Graviton: Trusted Execution Environments on GPUs , 2018, OSDI.

[15]  David Berthelot,et al.  High-Fidelity Extraction of Neural Network Models , 2019, ArXiv.

[16]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[18]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[19]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[20]  David Lie,et al.  Tackling runtime-based obfuscation in Android with TIRO , 2018, USENIX Security Symposium.

[21]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[22]  Lejla Batina,et al.  CSI Neural Network: Using Side-channels to Recover Your Artificial Neural Network Information , 2018, IACR Cryptol. ePrint Arch..

[23]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[24]  Christof Fetzer,et al.  TensorSCONE: A Secure TensorFlow Framework using Intel SGX , 2019, ArXiv.

[25]  Farinaz Koushanfar,et al.  DeepAttest: An End-to-End Attestation Framework for Deep Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[26]  David Berthelot,et al.  High Accuracy and High Fidelity Extraction of Neural Networks , 2020, USENIX Security Symposium.