论文信息 - Estimating the Number of Components in a Mixture of Multilayer Perceptrons

Estimating the Number of Components in a Mixture of Multilayer Perceptrons

Bayesian information criterion (BIC) criterion is widely used by the neural-network community for model selection tasks, although its convergence properties are not always theoretically established. In this paper we will focus on estimating the number of components in a mixture of multilayer perceptrons and proving the convergence of the BIC criterion in this frame. The penalized marginal-likelihood for mixture models and hidden Markov models introduced by Keribin [Consistent estimation of the order of mixture models, Sankhya Indian J. Stat. 62 (2000) 49-66] and, respectively, Gassiat [Likelihood ratio inequalities with applications to various mixtures, Ann. Inst. Henri Poincare 38 (2002) 897-906] is extended to mixtures of multilayer perceptrons for which a penalized-likelihood criterion is proposed. We prove its convergence under some hypothesis which involve essentially the bracketing entropy of the generalized score-function class and illustrate it by some numerical examples.

Madalina Olteanu | Joseph Rynkiewicz | Madalina Olteanu | J. Rynkiewicz

[1] Jianfeng Yao,et al. On Least Squares Estimation for Stable Nonlinear AR Processes , 2000 .

[2] Joseph Rynkiewicz. Hybrid HMM/MLP models for times series prediction , 1999, ESANN.

[3] P. Massart,et al. Invariance principles for absolutely regular empirical processes , 1995 .

[4] J.-F. YAO,et al. On stability of nonlinear AR processes with Markov switching , 2000, Advances in Applied Probability.

[5] R. Redner,et al. Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[6] E. Gassiat. Likelihood ratio inequalities with applications to various mixtures , 2002 .

[7] H. Teicher. Identifiability of Finite Mixtures , 1963 .

[8] P. Doukhan. Mixing: Properties and Examples , 1994 .

[9] Joseph Rynkiewicz. Consistent estimation of the architecture of multilayer perceptrons , 2006, ESANN.

[10] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[11] Y. Shao,et al. Asymptotics for likelihood ratio tests under loss of identifiability , 2003 .

[12] Ashok N. Srivastava,et al. Nonlinear gated experts for time series: discovering regimes and avoiding overfitting , 1995, Int. J. Neural Syst..

[13] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .