Learning Regularization Parameters of Radial Basis Functions in Embedded Likelihoods Space

Neural networks with radial basis activation functions are typically trained in two different phases: the first consists in the construction of the hidden layer, while the second consists in finding the output layer weights. Constructing the hidden layer involves defining the number of units in it, as well as their centers and widths. The training process of the output layer can be done using least squares methods, usually setting a regularization term. This work proposes an approach for building the whole network using information extracted directly from the projected training data in the space formed by the likelihoods functions. One can, then, train RBF networks for pattern classification with minimal external intervention.

[1]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[2]  A. P. Braga,et al.  Otimização da Largura de Kernels RBF para Máquinas de Vetores de Suporte: Uma Abordagem Baseada em Estimativa de Densidades , 2018 .

[3]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[4]  Kezhi Mao,et al.  RBF neural network center selection based on Fisher ratio class separability measure , 2002, IEEE Trans. Neural Networks.

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Sung-Kwun Oh,et al.  Design of K-means clustering-based polynomial radial basis function neural networks (pRBF NNs) realized with the aid of particle swarm optimization and differential evolution , 2012, Neurocomputing.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Douglas C. Montgomery,et al.  Applied Statistics and Probability for Engineers, Third edition , 1994 .

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[12]  J. Ross,et al.  Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[13]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[15]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[16]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .