What Bayes has to Say about the Evidence Procedure

The ``evidence'' procedure for setting hyperparameters is essentially the same as the techniques of ML-II and generalized maximum likelihood. Unlike those older techniques however, the evidence procedure has been justified (and used) as an approximation to the hierarchical Bayesian calculation. We use several examples to explore the validity of this justification. Then we derive upper and (often large) lower bounds on the difference between the evidence procedure's answer and the hierarchical Bayesian answer, for many different quantities. We also touch on subjects like the close relationship between the evidence procedure and maximum likelihood, and the self-consistency of deriving priors by ``first-principles'' arguments that don't set the values of hyperparameters.

[1]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[2]  Stephen F. Gull,et al.  Developments in Maximum Entropy Data Analysis , 1989 .

[3]  Guy Demoment,et al.  Image reconstruction and restoration: overview of common estimation structures and problems , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[5]  S. Sibisi REGULARIZATION AND INVERSE PROBLEMS , 1989 .

[6]  D. M. Titterington,et al.  A Study of Methods of Choosing the Smoothing Parameter in Image Restoration by Regularization , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[8]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[9]  David H. Wolpert,et al.  On the Use of Evidence in Neural Networks , 1992, NIPS.

[10]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[11]  A. R. Davies,et al.  Optimisation in the regularisation ill-posed problems , 1986, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.

[12]  D. R. Wolf,et al.  Alpha, Evidence, and the Entropic Prior , 1993 .

[13]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[14]  J. Skilling Classic Maximum Entropy , 1989 .

[15]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[16]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[17]  Wilfrid S. Kendall,et al.  Networks and Chaos - Statistical and Probabilistic Aspects , 1993 .

[18]  John Skilling,et al.  Maximum Entropy and Bayesian Methods , 1989 .

[19]  A. M. Thompson,et al.  On some Bayesian choices of regularization parameter in image restoration , 1993 .

[20]  J. Skilling Maximum entropy and bayesian methods : 8 : 1988 , 1989 .

[21]  I. Johnstone,et al.  Maximum Entropy and the Nearly Black Object , 1992 .

[22]  David H. Wolpert,et al.  Bayesian Backpropagation Over I-O Functions Rather Than Weights , 1993, NIPS.

[23]  Anne Lohrli Chapman and Hall , 1985 .

[24]  TWO-WEEK Loan COpy,et al.  University of California , 1886, The American journal of dental science.

[25]  Yves Goussard,et al.  GCV and ML Methods of Determining Parameters in Image Restoration by Regularization: Fast Computation in the Spatial Domain and Experimental Comparison , 1993, J. Vis. Commun. Image Represent..