Privacy-Preserving Boosting with Random Linear Classifiers

We propose SecureBoost, a privacy-preserving predictive modeling framework, that allows service providers (SPs) to build powerful boosting models over encrypted or randomly masked user submitted data. SecureBoost uses random linear classifiers (RLCs) as the base classifiers. A Cryptographic Service Provider (CSP) manages keys and assists the SP's processing to reduce the complexity of the protocol constructions. The SP learns only the base models (i.e., RLCs) and the CSP learns only the weights of the base models and a limited leakage function. This separated parameter holding avoids any party from abusing the final model or conducting model-based attacks. We evaluate two constructions of SecureBoost: HE+GC and SecSh+GC using combinations of primitives - homomorphic encryption, garbled circuits, and random masking. We show that SecureBoost efficiently learns high-quality boosting models from protected user-generated data with practical costs.

[1]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[2]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[3]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[4]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[5]  Sadie Creese,et al.  Insider Attacks in Cloud Computing , 2012, 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications.

[6]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[7]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.