On-Line Learning Dynamics of Multilayer Perceptrons with Unidentifiable Parameters

In the over-realizable learning scenario of multilayer perceptrons, in which the student network has a larger number of hidden units than the true or optimal network, some of the weight parameters are unidentifiable. In this case, the teacher network consists of a union of optimal subspaces included in the parameter space. The optimal subspaces, which lead to singularities, are known to affect the estimation performance of neural networks. Using statistical mechanics, we investigate the online learning dynamics of two-layer neural networks in the over-realizable scenario with unidentifiable parameters. We show that the convergence speed strongly depends on the initial parameter conditions. We also show that there is a quasi-plateau around the optimal subspace, which differs from the well-known plateaus caused by permutation symmetry. In addition, we discuss the property of the final learning state, relating this to the singular structures.

[1]  Shun-ichi Amari,et al.  Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons , 2001, NIPS.

[2]  Saad,et al.  On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[3]  Michael Biehl,et al.  On-line backpropagation in two-layered neural networks , 1995 .

[4]  Hyeyoung Park,et al.  On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units : Steepest Gradient Descent and Natural Gradient Descent , 2002, cond-mat/0212006.

[5]  Katsuyuki Hagiwara,et al.  On the Problem in Model Selection of Neural Network Regression in Overrealizable Scenario , 2002, Neural Computation.

[6]  Shun-ichi Amari,et al.  Learning and inference in hierarchical models with singularities , 2003, Systems and Computers in Japan.

[7]  Sumio Watanabe,et al.  Algebraic Analysis for Nonidentifiable Learning Machines , 2001, Neural Computation.

[8]  Michael Biehl,et al.  Learning by on-line gradient descent , 1995 .

[9]  Peter Riegler,et al.  Dynamics of on-line learning in neural networks , 1997 .

[10]  Michael Biehl,et al.  Transient dynamics of on-line learning in two-layered neural networks , 1996 .

[11]  Katsuyuki Hagiwara,et al.  On the Problem in Model Selection of Neural Network Regression in Overrealizable Scenario , 2002, Neural Computation.

[12]  K. Fukumizu Likelihood ratio of unidentifiable models and multilayer neural networks , 2003 .