Assisted Learning: A Framework for Multi-Organization Learning

In an increasing number of AI scenarios, collaborations among different organizations or agents (e.g., human and robots, mobile units) are often essential to accomplish an organization-specific mission. However, to avoid leaking useful and possibly proprietary information, organizations typically enforce stringent security constraints on sharing modeling algorithms and data, which significantly limits collaborations. In this work, we introduce the Assisted Learning framework for organizations to assist each other in supervised learning tasks without revealing any organization’s algorithm, data, or even task. An organization seeks assistance by broadcasting task-specific but nonsensitive statistics and incorporating others’ feedback in one or more iterations to eventually improve its predictive performance. Theoretical and experimental studies, including real-world medical benchmarks, show that Assisted Learning can often achieve near-oracle learning performance as if data and training processes were centralized.

[1]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[2]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[3]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[4]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[5]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[9]  Yuhong Yang REGRESSION WITH MULTIPLE CANDIDATE MODELS: SELECTING OR MIXING? , 1999 .

[10]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[11]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[12]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[13]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[14]  Stelvio Cimato,et al.  Encyclopedia of Cryptography and Security , 2005 .

[15]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[16]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[17]  Jaideep Vaidya,et al.  A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data , 2008, Privacy-Preserving Data Mining.

[18]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[19]  Mohssen Alabbadi Mobile Learning (mLearning) Based on Cloud Computing: mLearning as a Service (mLaaS) , 2011 .

[20]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[21]  David J. Stone,et al.  The High Cost of Low‐Acuity ICU Outliers , 2012, Journal of healthcare management / American College of Healthcare Executives.

[22]  Marina Blanton,et al.  Secure Multiparty Computation , 2011, Encyclopedia of Cryptography and Security.

[23]  Rodrigo Roman,et al.  On the features and challenges of security and privacy in distributed internet of things , 2013, Comput. Networks.

[24]  Steven X. Ding,et al.  A Review on Basic Data-Driven Approaches for Industrial Process Monitoring , 2014, IEEE Transactions on Industrial Electronics.

[25]  M. Rowbotham,et al.  Effect of variability in the 7-day baseline pain diary on the assay sensitivity of neuropathic pain randomized clinical trials: An ACTTION study , 2014, PAIN®.

[26]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[27]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[28]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29]  B. Lo Sharing clinical trial data: maximizing benefits, minimizing risk. , 2015, JAMA.

[30]  Miriam A. M. Capretz,et al.  MLaaS: Machine Learning as a Service , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[31]  Jia Liu,et al.  Three-dimensional mapping and regulation of action potential propagation in nanoelectronics innervated tissues , 2016, Nature nanotechnology.

[32]  Mariana Raykova,et al.  Secure Linear Regression on Vertically Partitioned Datasets , 2016, IACR Cryptol. ePrint Arch..

[33]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[36]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[37]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[38]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[39]  Yi Shi,et al.  How to steal a machine learning classifier with deep learning , 2017, 2017 IEEE International Symposium on Technologies for Homeland Security (HST).

[40]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[41]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[42]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[43]  Yan Liu,et al.  Benchmarking deep learning models on large healthcare datasets , 2018, J. Biomed. Informatics.

[44]  Jie Ding,et al.  Model Selection Techniques: An Overview , 2018, IEEE Signal Processing Magazine.

[45]  Hyoung Il Son,et al.  Unmanned Aerial Vehicles in Agriculture: A Review of Perspective of Platform, Control, and Applications , 2019, IEEE Access.

[46]  Qiang Yang,et al.  A Communication Efficient Vertical Federated Learning Framework , 2019, ArXiv.

[47]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Haoyi Xiong,et al.  SecureGBM: Secure Multi-Party Gradient Boosting , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[49]  E. Manavalan,et al.  A review of Internet of Things (IoT) embedded sustainable supply chain for industry 4.0 requirements , 2019, Comput. Ind. Eng..

[50]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[51]  Hani S. Mahmassani,et al.  Evaluating the impact of spatio-temporal demand forecast aggregation on the operational performance of shared autonomous mobility fleets , 2019, Transportation.

[52]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[53]  Somesh Jha,et al.  Exploring Connections Between Active Learning and Model Extraction , 2018, USENIX Security Symposium.

[54]  Mingyi Hong,et al.  Imitation Privacy , 2020, ArXiv.

[55]  Jie Ding,et al.  HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients , 2020, ICLR.

[56]  Jie Ding,et al.  Information Laundering for Model Privacy , 2020, ICLR.

[57]  Qiang Yang,et al.  SecureBoost: A Lossless Federated Learning Framework , 2019, IEEE Intelligent Systems.