Unifying distillation and privileged information

Distillation (Hinton et al., 2015) and privileged information (Vapnik & Izmailov, 2015) are two techniques that enable machines to learn from other machines. This paper unifies these two techniques into generalized distillation, a framework to learn from multiple machines and data representations. We provide theoretical and causal insight about the inner workings of generalized distillation, extend it to unsupervised, semisupervised and multitask learning scenarios, and illustrate its efficacy on a variety of numerical simulations on both synthetic and real-world data.

[1]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[5]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[6]  Jason Weston,et al.  Inference with the Universum , 2006, ICML.

[7]  Bernhard Schölkopf,et al.  An Analysis of Inference with the Universum , 2007, NIPS.

[8]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[9]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[10]  V. Vapnik,et al.  On the theory of learning with Privileged Information , 2010, NIPS 2010.

[11]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[12]  Bernardete Ribeiro,et al.  Financial distress model prediction using SVM+ , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[13]  Uwe Aickelin,et al.  Privileged information for data clustering , 2012, Inf. Sci..

[14]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Christoph H. Lampert,et al.  Learning to Rank Using Privileged Information , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Peter Tiño,et al.  Incorporating Privileged Information Through Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[19]  Bernhard Schölkopf,et al.  Randomized Nonlinear Component Analysis , 2014, ICML.

[20]  Christoph H. Lampert,et al.  Mind the Nuisance: Gaussian Process Classification using Privileged Noise , 2014, NIPS.

[21]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Christoph H. Lampert,et al.  Learning to Transfer Privileged Information , 2014, ArXiv.

[23]  Bernt Schiele,et al.  Learning using privileged information: SV M+ and weighted SVM , 2013, Neural Networks.

[24]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[25]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[27]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[28]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .