论文信息 - A majorization-minimization algorithm for (multiple) hyperparameter learning

A majorization-minimization algorithm for (multiple) hyperparameter learning

We present a general Bayesian framework for hyperparameter tuning in L2-regularized supervised learning models. Paradoxically, our algorithm works by first analytically integrating out the hyperparameters from the model. We find a local optimum of the resulting non-convex optimization problem efficiently using a majorization-minimization (MM) algorithm, in which the non-convex problem is reduced to a series of convex L2-regularized parameter estimation tasks. The principal appeal of our method is its simplicity: the updates for choosing the L2-regularized subproblems in each step are trivial to implement (or even perform by hand), and each subproblem can be efficiently solved by adapting existing solvers. Empirical results on a variety of supervised learning models show that our algorithm is competitive with both grid-search and gradient-based algorithms, but is more efficient and far easier to implement.

Chuan-Sheng Foo | Andrew Y. Ng | Chuong B. Do

[1] Serafim Batzoglou,et al. CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[2] Gavin C. Cawley,et al. Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[3] Christian Igel,et al. Gradient-Based Adaptation of General Gaussian Kernels , 2005, Neural Computation.

[4] Mário A. T. Figueiredo. Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5] J. Larsen,et al. Design and regularization of neural networks: the optimal use of a validation set , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[6] Sean R. Eddy,et al. Rfam: an RNA family database , 2003, Nucleic Acids Res..

[7] Alan L. Yuille,et al. The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[8] L.K. Hansen,et al. Adaptive regularization of neural classifiers , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[9] Alan L. Yuille,et al. The Concave-Convex Procedure , 2003, Neural Computation.

[10] S. Sathiya Keerthi,et al. An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models , 2006, NIPS.

[11] D. Hunter,et al. Optimization Transfer Using Surrogate Objective Functions , 2000 .

[12] Chuan-Sheng Foo,et al. Efficient multiple hyperparameter learning for log-linear models , 2007, NIPS.

[13] David J. C. MacKay,et al. Bayesian Interpolation , 1992, Neural Computation.

[14] Michael E. Tipping,et al. Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[15] B. Schölkopf,et al. Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation , 2007 .

[16] Sayan Mukherjee,et al. Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[17] Yoram Bresler,et al. Globally convergent edge-preserving regularized reconstruction: an application to limited-angle tomography , 1998, IEEE Trans. Image Process..

[18] Wray L. Buntine,et al. Bayesian Back-Propagation , 1991, Complex Syst..

[19] Peter M. Williams,et al. Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[20] George Eastman House,et al. Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[21] Geoffrey E. Hinton,et al. Bayesian Learning for Neural Networks , 1995 .

[22] Jan Larsen,et al. Adaptive regularization of neural networks using conjugate gradient , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[23] Lars Kai Hansen,et al. Adaptive Regularization in Neural Network Modeling , 1996, Neural Networks: Tricks of the Trade.

[24] Stephen P. Boyd,et al. Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices , 2003, Proceedings of the 2003 American Control Conference, 2003..