Almost All Learning Machines are Singular

A learning machine is called singular if its Fisher information matrix is singular. Almost all learning machines used in information processing are singular, for example, layered neural networks, normal mixtures, binomial mixtures, Bayes networks, hidden Markov models, Boltzmann machines, stochastic context-free grammars, and reduced rank regressions are singular. In singular learning machines, the likelihood function can not be approximated by any quadratic form of the parameter. Moreover, neither the distribution of the maximum likelihood estimator nor the Bayes a posteriori distribution converges to the normal distribution, even if the number of training samples tends to infinity. Therefore, the conventional statistical learning theory does not hold in singular learning machines. This paper establishes the new mathematical foundation for singular learning machines. We propose that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which we can prove the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation. The result will be a base on which training algorithms of singular learning machines are devised and optimized

[1]  Shun-ichi Amari,et al.  Singularities Affect Dynamics of Learning in Neuromanifolds , 2006, Neural Comput..

[2]  Sumio Watanabe,et al.  The Exchange Monte Carlo Method for Bayesian Learning in Singular Learning Machines , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[3]  Sumio Watanabe,et al.  Singularities in complete bipartite graph-type Boltzmann machines and upper bounds of stochastic complexities , 2005, IEEE Transactions on Neural Networks.

[4]  Shinichi Nakajima,et al.  Variational Bayes Solution of Linear Neural Networks and Its Generalization Performance , 2007, Neural Computation.

[5]  Sumio Watanabe,et al.  Algebraic Analysis for Nonidentifiable Learning Machines , 2001, Neural Computation.

[6]  Shun-ichi Amari,et al.  Learning Coefficients of Layered Models When the True Distribution Mismatches the Singularities , 2003, Neural Computation.

[7]  J. Hartigan A failure of likelihood asymptotics for normal mixtures , 1985 .

[8]  Masaki Kashiwara,et al.  B-functions and holonomic systems , 1976 .

[9]  Sumio Watanabe Algebraic Information Geometry for Learning Machines with Singularities , 2000, NIPS.

[10]  Kazuho Watanabe,et al.  Free Energy of Stochastic Context Free Grammar on Variational Bayes , 2006, ICONIP.

[11]  Shiro Usui,et al.  On the Asymptotic Distribution of the Least-Squares Estimators in Unidentifiable Models , 2004, Neural Computation.

[12]  Michael Atiyah,et al.  Resolution of Singularities and Division of Distributions , 1970 .

[13]  Miki Aoyagi,et al.  Resolution of Singularities and the Generalization Error with Bayesian Estimation for Layered Neural Network , 2005 .

[14]  Sumio Watanabe Algebraic geometry of singular learning machines and symmetry of generalization and training errors , 2005, Neurocomputing.

[15]  Kazuho Watanabe,et al.  Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation , 2006, J. Mach. Learn. Res..

[16]  Sumio Watanabe,et al.  Stochastic complexities of reduced rank regression in Bayesian estimation , 2005, Neural Networks.

[17]  H. Hironaka Resolution of Singularities of an Algebraic Variety Over a Field of Characteristic Zero: II , 1964 .

[18]  Sumio Watanabe Algebraic geometrical methods for hierarchical learning machines , 2001, Neural Networks.

[19]  Kazuho Watanabe,et al.  Stochastic complexities of general mixture models in variational Bayesian learning , 2007, Neural Networks.

[20]  Sumio Watanabe,et al.  Singularities in mixture models and upper bounds of stochastic complexity , 2003, Neural Networks.

[21]  Sumio Watanabe,et al.  Algebraic geometry and stochastic complexity of hidden Markov models , 2005, Neurocomputing.

[22]  Sumio Watanabe,et al.  Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines , 2006, ICONIP.

[23]  Sumio Watanabe Algebraic Analysis for Singular Statistical Estimation , 1999, ALT.

[24]  Sumio Watanabe,et al.  Learning efficiency of redundant neural networks in Bayesian estimation , 2001, IEEE Trans. Neural Networks.

[25]  Katsuyuki Hagiwara,et al.  On the Problem in Model Selection of Neural Network Regression in Overrealizable Scenario , 2002, Neural Computation.