Estimating the Number of Components in a Mixture of Multilayer Perceptrons

Bayesian information criterion (BIC) criterion is widely used by the neural-network community for model selection tasks, although its convergence properties are not always theoretically established. In this paper we will focus on estimating the number of components in a mixture of multilayer perceptrons and proving the convergence of the BIC criterion in this frame. The penalized marginal-likelihood for mixture models and hidden Markov models introduced by Keribin [Consistent estimation of the order of mixture models, Sankhya Indian J. Stat. 62 (2000) 49-66] and, respectively, Gassiat [Likelihood ratio inequalities with applications to various mixtures, Ann. Inst. Henri Poincare 38 (2002) 897-906] is extended to mixtures of multilayer perceptrons for which a penalized-likelihood criterion is proposed. We prove its convergence under some hypothesis which involve essentially the bracketing entropy of the generalized score-function class and illustrate it by some numerical examples.