Model Reuse With Reduced Kernel Mean Embedding Specification

Given a publicly available pool of machine learning models constructed for various tasks, when a user plans to build a model for her own machine learning application, is it possible to build upon models in the pool such that the previous efforts on these existing models can be reused rather than starting from scratch? Here, a grand challenge is how to find models that are helpful for the current application, without accessing the raw training data for the models in the pool. In this paper, we present a two-phase framework. In the upload phase, when a model is uploading into the pool, we construct a reduced kernel mean embedding (RKME) as a specification for the model. Then in the deployment phase, the relatedness of the current task and pre-trained models will be measured based on the value of the RKME specification. Theoretical results and extensive experiments validate the effectiveness of our approach.

[1]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[2]  Zhi-Hua Zhou,et al.  Learnware: on the future of machine learning , 2016, Frontiers of Computer Science.

[3]  Shiliang Sun,et al.  A survey of multi-source domain adaptation , 2015, Inf. Fusion.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Barnabás Póczos,et al.  Hypothesis Transfer Learning via Transformation Functions , 2016, NIPS.

[6]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[7]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[8]  Yishay Mansour,et al.  Multiple Source Adaptation and the Rényi Divergence , 2009, UAI.

[9]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[10]  Mehryar Mohri,et al.  Algorithms and Theory for Multiple-Source Adaptation , 2018, NeurIPS.

[11]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[12]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[13]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[14]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[15]  Zhi-Hua Zhou,et al.  Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin , 2019, ICML.

[16]  April Clyburne-Sherin,et al.  Computational Reproducibility via Containers in Social Psychology , 2018 .

[17]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18]  Bhiksha Raj,et al.  Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers , 2010, NIPS.

[19]  Max Welling,et al.  Herding: driving deterministic dynamics to learn and sample probabilistic models , 2013 .

[20]  Zhi-Hua Zhou,et al.  NeC4.5: neural ensemble based C4.5 , 2004, IEEE Transactions on Knowledge and Data Engineering.

[21]  Bernhard Schölkopf,et al.  A Permutation-Based Kernel Conditional Independence Test , 2014, UAI.

[22]  Bernhard Schölkopf,et al.  One-Class Support Measure Machines for Group Anomaly Detection , 2013, UAI.

[23]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[24]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[25]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[26]  Ilja Kuzborskij,et al.  Stability and Hypothesis Transfer Learning , 2013, ICML.

[27]  Bernhard Schölkopf,et al.  Towards a Learning Theory of Causation , 2015, 1502.02398.

[28]  Max Welling,et al.  Herding Dynamic Weights for Partially Observed Random Field Models , 2009, UAI.

[29]  Arthur Gretton,et al.  Interpretable Distribution Features with Maximum Testing Power , 2016, NIPS.

[30]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[31]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[32]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[33]  April Clyburne-Sherin,et al.  Computational Reproducibility via Containers in Psychology , 2019, Meta-Psychology.

[34]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[35]  Barbara Caputo,et al.  Learning Categories From Few Examples With Multi Model Knowledge Transfer , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[37]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[38]  Patricio A. Vela,et al.  Kernel map compression using generalized radial basis functions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39]  Bernhard Schölkopf,et al.  Differentially Private Database Release via Kernel Mean Embeddings , 2017, ICML.