Assisted Learning: A Framework for Multiple Organizations' Learning

Large-scale multimodal data are being rapidly generated from the interaction between humans and machines. Sharing heterogeneous data among multiple organizations for collaborative learning typically faces tradeoffs between learning efficiency and data privacy. In general, for a method to achieve the optimal performance on one aspect will sacrifice another one. To tackle this challenge, we introduce the Assisted Learning framework where a service provider Bob assists a user Alice with supervised learning tasks without transmitting Bob's private algorithm or data. Bob assists Alice either by building a predictive model using Alice's labels or by improving Alice's learning through iterative transmissions of task-specific statistics. Theoretical analysis shows that the proposed method can achieve lossless learning performance for certain models. We also demonstrate the wide applicability of the proposed approach by various experiments, including real-world medical benchmarks.

[1]  M. Rowbotham,et al.  Effect of variability in the 7-day baseline pain diary on the assay sensitivity of neuropathic pain randomized clinical trials: An ACTTION study , 2014, Pain.

[2]  Jie Ding,et al.  Model Selection Techniques: An Overview , 2018, IEEE Signal Processing Magazine.

[3]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[4]  Mohssen Alabbadi Mobile Learning (mLearning) Based on Cloud Computing: mLearning as a Service (mLaaS) , 2011 .

[5]  Jie Ding,et al.  Information Laundering for Model Privacy , 2020, ICLR.

[6]  Yan Liu,et al.  Benchmarking deep learning models on large healthcare datasets , 2018, J. Biomed. Informatics.

[7]  E. Manavalan,et al.  A review of Internet of Things (IoT) embedded sustainable supply chain for industry 4.0 requirements , 2019, Comput. Ind. Eng..

[8]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[9]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[10]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[11]  L. Breiman Stacked Regressions , 1996, Machine Learning.

[12]  Kennan T. Smith,et al.  Practical and mathematical aspects of the problem of reconstructing objects from radiographs , 1977 .

[13]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[14]  Hani S. Mahmassani,et al.  Evaluating the impact of spatio-temporal demand forecast aggregation on the operational performance of shared autonomous mobility fleets , 2019, Transportation.

[15]  Qiang Yang,et al.  SecureBoost: A Lossless Federated Learning Framework , 2019, IEEE Intelligent Systems.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[17]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[18]  Anand D. Sarwate,et al.  Randomized requantization with local differential privacy , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Mariana Raykova,et al.  Privacy-Preserving Distributed Linear Regression on High-Dimensional Data , 2017, Proc. Priv. Enhancing Technol..

[20]  Jie Ding,et al.  HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients , 2020, ICLR.

[21]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[22]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[24]  Tianjian Chen,et al.  A Communication Efficient Collaborative Learning Framework for Distributed Features , 2019 .

[25]  Jaideep Vaidya,et al.  A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data , 2008, Privacy-Preserving Data Mining.

[26]  Somesh Jha,et al.  Exploring Connections Between Active Learning and Model Extraction , 2018, USENIX Security Symposium.

[27]  Guy N. Rothblum,et al.  Concentrated Differential Privacy , 2016, ArXiv.

[28]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[29]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[30]  Jie Ding,et al.  Asymptotically Optimal Prediction for Time-Varying Data Generating Processes , 2019, IEEE Transactions on Information Theory.

[31]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[32]  K. Hamidieh A data-driven statistical model for predicting the critical temperature of a superconductor , 2018, Computational Materials Science.

[33]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34]  Chris Clifton,et al.  Privacy Preserving Naïve Bayes Classifier for Vertically Partitioned Data , 2004, SDM.

[35]  Amit Sahai,et al.  Secure Multi-Party Computation , 2013 .

[36]  Miriam A. M. Capretz,et al.  MLaaS: Machine Learning as a Service , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[37]  Hyoung Il Son,et al.  Unmanned Aerial Vehicles in Agriculture: A Review of Perspective of Platform, Control, and Applications , 2019, IEEE Access.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yi Shi,et al.  How to steal a machine learning classifier with deep learning , 2017, 2017 IEEE International Symposium on Technologies for Homeland Security (HST).

[40]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  J. Friedman Multivariate adaptive regression splines , 1990 .

[42]  Steven X. Ding,et al.  A Review on Basic Data-Driven Approaches for Industrial Process Monitoring , 2014, IEEE Transactions on Industrial Electronics.

[43]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[44]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[45]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[46]  M. Breslow,et al.  The High Cost of Low‐Acuity ICU Outliers , 2012, Journal of healthcare management / American College of Healthcare Executives.

[47]  Haoyi Xiong,et al.  SecureGBM: Secure Multi-Party Gradient Boosting , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[48]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[49]  Jia Liu,et al.  Three-dimensional mapping and regulation of action potential propagation in nanoelectronics innervated tissues , 2016, Nature nanotechnology.

[50]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[51]  Tianjian Chen,et al.  A Communication Efficient Vertical Federated Learning Framework , 2019, ArXiv.

[52]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[53]  Mingyi Hong,et al.  Imitation Privacy , 2020, ArXiv.

[54]  Yuhong Yang REGRESSION WITH MULTIPLE CANDIDATE MODELS: SELECTING OR MIXING? , 1999 .

[55]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[56]  Andrew Chi-Chih Yao,et al.  How to Generate and Exchange Secrets (Extended Abstract) , 1986, FOCS.

[57]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[58]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[59]  Bernhard Schölkopf,et al.  Unifying distillation and privileged information , 2015, ICLR.

[60]  Mariana Raykova,et al.  Secure Linear Regression on Vertically Partitioned Datasets , 2016, IACR Cryptol. ePrint Arch..

[61]  B. Lo Sharing clinical trial data: maximizing benefits, minimizing risk. , 2015, JAMA.

[62]  Chris Clifton,et al.  Privacy-Preserving Decision Trees over Vertically Partitioned Data , 2005, DBSec.

[63]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[64]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[65]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[66]  Rodrigo Roman,et al.  On the features and challenges of security and privacy in distributed internet of things , 2013, Comput. Networks.

[67]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[68]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[69]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.