论文信息 - MLPrivacyGuard: Defeating Confidence Information based Model Inversion Attacks on Machine Learning Systems

MLPrivacyGuard: Defeating Confidence Information based Model Inversion Attacks on Machine Learning Systems

As services based on Machine Learning (ML) applications find increasing use, there is a growing risk of attack against such systems. Recently, adversarial machine learning has received a lot of attention, where an adversary is able to craft an input or manipulate an input to cause an ML system to misclassify. Another attack of concern is when an adversary with access to a ML model can reverse engineer attributes of a target class, creating a privacy concern, which is the subject of this paper. Such attacks use non-sensitive data obtainable by the adversary and the confidence levels returned by the ML model to infer sensitive attributes of the target user. Model Inversion attacks may be classified as white-box, where the ML model is known to the attacker, or black-box, where the adversary does not know the internals of the model. If the attacker has access to non-sensitive data of a target user, they can infer sensitive data by applying gradient ascent on the confidence returned by the model. Therefore, a black-box attack can be mounted by numerical approximations of the gradient to perform the gradient ascent. In this work, we present MLPrivacyGuard, a countermeasure against black-box model inversion attack is presented. This countermeasure consists of adding controlled noise to the output of the confidence function. It is important to preserve the accuracy of prediction/classification for the real users of the model while preventing attackers to infer sensitive data. This involves a trade-off between misclassification error and the effectiveness of defense. Based on experimental results, we demonstrate that when noise is injected with a long-tailed distribution, the objectives of low misclassification error with a strong defense can be attained as model inversion attacks are neutralized because numerical approximation of gradient ascent is unable to converge.

[1] Somesh Jha,et al. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[2] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[3] Ladislav Hluchý,et al. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey , 2019, Artificial Intelligence Review.

[4] Igor Kononenko,et al. Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[5] Alexandros Karatzoglou,et al. Deep Learning for Recommender Systems , 2017, RecSys.

[6] Fan Zhang,et al. Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[7] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[8] Somesh Jha,et al. Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[9] E. N.,et al. The Calculus of Finite Differences , 1934, Nature.