A Bayesian/information theoretic model of bias learning

In this paper the problem of learning appropriate bias for an environment of related tasks is examined from a Bayesian perspective. The environment of related tasks is shown to be naturally modelled by the concept of an {\em objective} prior distribution. Sampling from the objective prior corresponds to sampling different learning tasks from the environment. It is argued that for many common machine learning problems, although we don't know the true (objective) prior for the problem, we do have some idea of a set of possible priors to which the true prior belongs. It is shown that under these circumstances a learner can use Bayesian inference to learn the true prior by sampling from the objective prior. Bounds are given on the amount of information required to learn a task when it is simultaneously learnt with several other tasks. The bounds show that if the learner has little knowledge of the true prior, and the dimensionality of the true prior is small, then sampling multiple tasks is highly advantageous.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  I. Good Some history of the hierarchical Bayesian methodology , 1980 .

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[4]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[5]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[6]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[9]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[10]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[11]  Eduardo D. Sontag,et al.  For neural networks, function determines form , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[12]  Paul C. Kainen,et al.  Functionally Equivalent Feedforward Neural Networks , 1994, Neural Computation.

[13]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[14]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[15]  David Haussler,et al.  General bounds on the mutual information between a parameter and n conditionally independent observations , 1995, COLT '95.