Knowledge Aggregation via Epsilon Model Spaces

In many practical applications, machine learning is divided over multiple agents, where each agent learns a different task and/or learns from a different dataset. We present Epsilon Model Spaces (EMS), a framework for learning a global model by aggregating local learnings performed by each agent. Our approach forgoes sharing of data between agents, makes no assumptions on the distribution of data across agents, and requires minimal communication between agents. We empirically validate our techniques on MNIST experiments and discuss how EMS can generalize to a wide range of problem settings, including federated averaging and catastrophic forgetting. We believe our framework to be among the first to lay out a general methodology for "combining" distinct models.

[1]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[2]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[3]  Wei Shi,et al.  Federated learning of predictive models from federated Electronic Health Records , 2018, Int. J. Medical Informatics.

[4]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[5]  H. Simon,et al.  Rational choice and the structure of the environment. , 1956, Psychological review.

[6]  Li Zhang,et al.  Learning Differentially Private Language Models Without Losing Accuracy , 2017, ArXiv.

[7]  Zhiwei Steven Wu,et al.  Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing , 2017, bioRxiv.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[10]  Ohad Shamir,et al.  Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.

[11]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[14]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, Allerton.

[15]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[16]  OctoMiao Overcoming catastrophic forgetting in neural networks , 2016 .

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[19]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[20]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[21]  Philip S. Yu,et al.  Privacy-Preserving Data Mining: A Survey , 2008, Handbook of Database Security.

[22]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[23]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[25]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[26]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[27]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[28]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[29]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[30]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[31]  Le Song,et al.  Distributed Kernel Principal Component Analysis , 2015, ArXiv.

[32]  Sinno Jialin Pan,et al.  Distributed Multi-Task Relationship Learning , 2017, KDD.

[33]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[34]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).