Contextual Learning

Supervised, semi-supervised, and unsupervised learning estimate a function given input/output samples. Generalization to unseen samples requires prior knowledge (priors) about this function. However, there are priors that cannot be expressed by only taking the function, its input, and its output into account. In this paper, we propose contextual learning, which uses contextual data to define such priors. Contextual data are neither from the input space nor from the output space of the function, but include useful information for learning it. We exploit this information by formulating priors about how contextual data relate to the target function. Incorporating these priors regularizes learning and thereby improves generalization. Contextual learning subsumes a variety of related approaches, e.g. multi-task learning and learning using privileged information. Our contributions are (i) a new perspective that connects these previously isolated approaches, (ii) insights about how these methods incorporate useful priors by implementing different patterns, (iii) a simple way to apply them to novel problems, as well as (iv) a systematic experimental evaluation of these patterns in two supervised learning tasks.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Oliver Brock,et al.  State Representation Learning in Robotics: Using Prior Knowledge about Physical Interaction , 2014, Robotics: Science and Systems.

[3]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[4]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[5]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[6]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[7]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[8]  Rauf Izmailov,et al.  Learning using privileged information: similarity control and knowledge transfer , 2015, J. Mach. Learn. Res..

[9]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Stefan Haufe,et al.  Finding brain oscillations with power dependencies in neuroimaging data , 2014, NeuroImage.

[11]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[12]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.

[13]  Robert A. Legenstein,et al.  Reinforcement Learning on Slow Features of High-Dimensional Input Streams , 2010, PLoS Comput. Biol..

[14]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[15]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[16]  Laurent Itti,et al.  Improved Deep Learning of Object Category Using Pose Information , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[18]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[19]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Xiaoming Liu,et al.  Boosting with Side Information , 2012, ACCV.

[21]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[22]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[23]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[24]  Kristen Grauman,et al.  Learning image representations equivariant to ego-motion , 2015, ArXiv.

[25]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Dong Xu,et al.  Recognizing RGB Images by Learning from RGB-D Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Zhi-Hua Zhou,et al.  A New Analysis of Co-Training , 2010, ICML.

[29]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[30]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[31]  Marc Toussaint,et al.  A Primitive Based Generative Model to Infer Timing Information in Unpartitioned Handwriting Data , 2007, IJCAI.

[32]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[33]  Massimiliano Pontil,et al.  Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.

[34]  Peter Tiño,et al.  Incorporating Privileged Information Through Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[36]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[37]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[38]  Andreas Maurer,et al.  Bounds for Linear Multi-Task Learning , 2006, J. Mach. Learn. Res..

[39]  Uwe Aickelin,et al.  Privileged information for data clustering , 2012, Inf. Sci..

[40]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[41]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[42]  Geoffrey E. Hinton,et al.  Transforming Auto-Encoders , 2011, ICANN.

[43]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.