This article introduces the book, “algebraic geometry and statistical learning theory. ” A parametric model in statistics or a learning machine in information science is called singular if it is not identifiable or if its Fisher information matrix is not positive definite. Although a lot of statistical models and learning machines are singular, their statistical properties have been left unknown. In this book, an algebraic geometrical method is established on which we can construct new statistical theory for singular models. Four main formulas are proved. Firstly, we show that any log likelihood function can be represented by a common standard form, based on resolution of singularities. Secondly, the asymptotic behavior of the Bayes marginal likelihood is derived by the zeta function theory. Thirdly, the asymptotic expansions of Bayes generalization and training errors are proved, which enable us to make a widely applicable information criterion for singular models. And lastly, the symmetry of the generalization and training errors in the maximum a posteriori method is proved. In this book, algebraic geometry is explained for non-mathematicians, and the concrete, applicable, and nontrivial formulas are introduced. Also it is theoretically shown that, in singular models, Bayes estimation is more appropriate than one point estimation, even asymptotically. 1 Outline of the book A parametric model in statistics or a learning machine in information science is called singular if the map from the parameter to the probability distribution is not one-to-one, or if its Fisher information matrix is not positive definite. A lot of statistical models are singular, for example, artificial neural networks, reduced rank regressions, normal mixtures, binomial mixtures, hidden Markov models, stochastic context-free grammars, Bayesian networks, and so on. In general, if a statistical model contains hierarchical structure, sub-module, or hidden variables, then it is singular. If a statistical model is singular, then the log likelihood function can not be approximated by any quadratic form, resulting that the conventional statistical theory of regular statistical models does not hold. In fact, Cramer-Rao inequality has no meaning, asymptotic normality of the maximum likelihood estimator does not hold, and the Bayes a posteriori distribution can not be approximated by any normal distribution. Neither AIC corresponds to the asymptotic average generalization error nor BIC is equal to the asymptotic Bayes marginal likelihood. It has been difficult to study singular models, because there are so many types of singularities in their log likelihood functions.
[1]
Sumio Watanabe,et al.
A Limit Theorem in Singular Regression Problem
,
2009,
ArXiv.
[2]
Shinichi Nakajima,et al.
Variational Bayes Solution of Linear Neural Networks and Its Generalization Performance
,
2007,
Neural Computation.
[3]
Sumio Watanabe,et al.
Equations of States in Singular Statistical Estimation
,
2007,
Neural Networks.
[4]
渡邊 澄夫.
Algebraic geometry and statistical learning theory
,
2009
.
[5]
Sumio Watanabe,et al.
Stochastic complexities of reduced rank regression in Bayesian estimation
,
2005,
Neural Networks.
[6]
Sumio Watanabe,et al.
Algebraic Analysis for Nonidentifiable Learning Machines
,
2001,
Neural Computation.
[7]
Sumio Watanabe,et al.
Equations of States in Statistical Learning for an Unrealizable and Regular Case
,
2010,
IEICE Trans. Fundam. Electron. Commun. Comput. Sci..
[8]
Sumio Watanabe.
Algebraic geometry of singular learning machines and symmetry of generalization and training errors
,
2005,
Neurocomputing.
[9]
Seth Sullivant,et al.
Lectures on Algebraic Statistics
,
2008
.
[10]
Sumio Watanabe.
Algebraic Analysis for Singular Statistical Estimation
,
1999,
ALT.
[11]
Dan Geiger,et al.
Asymptotic Model Selection for Naive Bayesian Networks
,
2002,
J. Mach. Learn. Res..
[12]
Sumio Watanabe,et al.
Asymptotic behavior of exchange ratio in exchange Monte Carlo method
,
2008,
Neural Networks.
[13]
Shun-ichi Amari,et al.
Learning Coefficients of Layered Models When the True Distribution Mismatches the Singularities
,
2003,
Neural Computation.
[14]
Kazuho Watanabe,et al.
Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation
,
2006,
J. Mach. Learn. Res..
[15]
Sumio Watanabe,et al.
Singularities in mixture models and upper bounds of stochastic complexity
,
2003,
Neural Networks.
[16]
Kazuho Watanabe,et al.
Free Energy of Stochastic Context Free Grammar on Variational Bayes
,
2006,
ICONIP.