Bayesian invariant measurements of generalisation for continuous distributions

A family of measurements of generalisation is proposed for estimators of continuous distributions. In particular, they apply to neural network learning rules associated with continuous neural networks. The optimal estimators (learning rules) in this sense are Bayesian decision methods with information divergence as loss function. The Bayesian framework guarantees internal coherence of such measurements, while the information geometric loss function guarantees invariance. The theoretical solution for the optimal estimator is derived by a variational method. It is applied to the family of Gaussian distributions and the implications are discussed. This is one in a series of technical reports on this topic; it generalises the results of ¸iteZhu95:prob.discrete to continuous distributions and serve as a concrete example of a larger picture ¸iteZhu95:generalisation.

[1]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[2]  T. Loredo From Laplace to Supernova SN 1987A: Bayesian Inference in Astrophysics , 1990 .

[3]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[4]  S. Eguchi Second Order Efficiency of Minimum Contrast Estimators in a Curved Exponential Family , 1983 .

[5]  Shun-ichi Amari,et al.  Differential geometrical theory of statistics , 1987 .

[6]  甘利 俊一 Differential geometry in statistical inference , 1987 .

[7]  Paul Marriott,et al.  Preferred Point Geometry and Statistical Manifolds , 1993 .

[8]  E. Pitman,et al.  Sufficient statistics and intrinsic accuracy , 1936, Mathematical Proceedings of the Cambridge Philosophical Society.

[9]  Huaiyu Zhu,et al.  Bayesian invariant measurements of generalisation for discrete distributions , 1995 .

[10]  S. Amari Differential Geometry of Curved Exponential Families-Curvatures and Information Loss , 1982 .

[11]  H. Akaike The Interpretation of Improper Prior Distributions as Limits of Data Dependent Proper Prior Distributions , 1980 .

[12]  Howard Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[13]  A. Dempster Elements of Continuous Multivariate Analysis , 1969 .

[14]  Rory A. Fisher,et al.  Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.

[15]  Sufficient Statistics with Nuisance Parameters , 1956 .

[16]  Gerald S. Rogers,et al.  Mathematical Statistics: A Decision Theoretic Approach , 1967 .

[17]  Donald Fraser,et al.  On Sufficiency and the Exponential Family , 1963 .

[18]  F. Yates Contributions to Mathematical Statistics , 1951, Nature.

[19]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[20]  R. Fisher Two New Properties of Mathematical Likelihood , 1934 .

[21]  M. Kendall Theoretical Statistics , 1956, Nature.

[22]  A. N. Kolmogorov,et al.  Foundations of the theory of probability , 1960 .

[23]  M. Stone,et al.  Marginalization Paradoxes in Bayesian and Structural Inference , 1973 .

[24]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[25]  L. M. M.-T. Theory of Probability , 1929, Nature.

[26]  O. E. Barndorff-Nielsen Likelihood and Observed Geometries , 1986 .

[27]  L. J. Savage,et al.  Application of the Radon-Nikodym Theorem to the Theory of Sufficient Statistics , 1949 .