SocInf: Membership Inference Attacks on Social Media Health Data With Machine Learning

Social media networks have shown rapid growth in the past, and massive social data are generated which can reveal behavior or emotion propensities of users. Numerous social researchers leverage machine learning technology to build social media analytic models which can detect the abnormal behaviors or mental illnesses from the social media data effectively. Although the researchers only public the prediction interfaces of the machine learning models, in general, these interfaces may leak information about the individual data records on which the models were trained. Knowing a certain user’s social media record was used to train a model can breach user privacy. In this paper, we present SocInf and focus on the fundamental problem known as membership inference. The key idea of SocInf is to construct a mimic model which has a similar prediction behavior with the public model, and then we can disclose the prediction differences between the training and testing data set by abusing the mimic model. With elaborated analytics on the predictions of the mimic model, SocInf can thus infer whether a given record is in the victim model’s training set or not. We empirically evaluate the attack performance of SocInf on machine learning models trained by Xgboost, logistics, and online cloud platform. Using the realistic data, the experiment results show that SocInf can achieve an inference accuracy and precision of 73% and 84%, respectively, in average, and of 83% and 91% at best.

[1]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Michael Backes,et al.  Membership Privacy in MicroRNA-based Studies , 2016, CCS.

[4]  Kai Chen,et al.  Understanding Membership Inferences on Well-Generalized Learning Models , 2018, ArXiv.

[5]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[6]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[7]  Hsu-Hsien Chi Interactive Digital Advertising vs. Virtual Brand Community , 2011 .

[8]  Chunling Yu,et al.  Social Media Peer Communication and Impacts on Purchase Intentions: A Consumer Socialization Framework , 2012 .

[9]  Ganesh Iyer,et al.  A Usability Evaluation of Tor Launcher , 2017, Proc. Priv. Enhancing Technol..

[10]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[11]  Reza Shokri,et al.  Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.

[12]  Yang Zhang,et al.  Tagvisor: A Privacy Advisor for Sharing Hashtags , 2018, WWW.

[13]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[14]  Somesh Jha,et al.  Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , 2017, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[15]  Stefan Katzenbeisser,et al.  Two Is Not Enough: Privacy Assessment of Aggregation Schemes in Smart Metering , 2017, Proc. Priv. Enhancing Technol..

[16]  Emiliano De Cristofaro,et al.  Knock Knock, Who's There? Membership Inference on Aggregate Location Data , 2017, NDSS.

[17]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[18]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[19]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[20]  Seong Joon Oh,et al.  Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Po Yang,et al.  Examining sensor-based physical activity recognition and monitoring for healthcare using Internet of Things: A systematic review , 2018, Journal of Biomedical Informatics.

[22]  Eran Toch,et al.  Analyzing large-scale human mobility data: a survey of machine learning methods and applications , 2019, Knowledge and Information Systems.

[23]  Derek E. Baird,et al.  Neomillennial User Experience Design Strategies: Utilizing Social Networking Media to Support “Always on” Learning Styles , 2005 .

[24]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[25]  Gil-Young Song,et al.  Predicting National Suicide Numbers with Social Media Data , 2013, PloS one.

[26]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Ramendra Singh,et al.  Bringing “Social” Into Sales: The Impact of Salespeople’S Social Media Use on Service Behaviors and Value Creation , 2012 .

[28]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[29]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[30]  K. Ferdinand,et al.  Zika virus pandemic—analysis of Facebook as a social media health information platform , 2017, American journal of infection control.

[31]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[32]  Strother H. Walker,et al.  Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.

[33]  Ling Liu,et al.  Towards Demystifying Membership Inference Attacks , 2018, ArXiv.

[34]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[35]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[36]  Saif Mohammad,et al.  Emotion Intensities in Tweets , 2017, *SEMEVAL.

[37]  Jinyuan Jia,et al.  AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning , 2018, USENIX Security Symposium.

[38]  Manolis Tsiknakis,et al.  MyHealthAvatar: Personalized and empowerment health services through Internet of Things technologies , 2014, 2014 4th International Conference on Wireless Mobile Communication and Healthcare - Transforming Healthcare Through Innovations in Mobile and Wireless Technologies (MOBIHEALTH).

[39]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[40]  Carl A. Gunter,et al.  Towards Measuring Membership Privacy , 2017, ArXiv.

[41]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[42]  Yang Liu,et al.  Efficient Data Query in Intermittently-Connected Mobile Ad Hoc Social Networks , 2015, IEEE Transactions on Parallel and Distributed Systems.

[43]  Yevgeniy Vorobeychik,et al.  Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings , 2015, AISTATS.

[44]  W. Bennett,et al.  DIGITAL MEDIA AND THE PERSONALIZATION OF COLLECTIVE ACTION , 2011 .

[45]  Yulia Strekalova,et al.  Emergent health risks and audience information engagement on social media. , 2016, American journal of infection control.

[46]  Yevgeniy Vorobeychik,et al.  Optimal randomized classification in adversarial settings , 2014, AAMAS.

[47]  Roland Eils,et al.  Identifying Personal DNA Methylation Profiles by Genotype Inference , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[48]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.