The Knowledge Gradient Policy Using A Sparse Additive Belief Model

We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso penalty. This policy is a unique and novel hybrid of Bayesian R\&S with frequentist learning. Particularly, our method naturally combines B-spline basis expansion and generalizes to the nonparametric additive model (KGSpAM) and functional ANOVA model. Theoretically, we provide the estimation error bounds of the posterior mean estimate and the functional estimate. Controlled experiments show that the algorithm efficiently learns the correct set of nonzero parameters even when the model is imbedded with hundreds of dummy parameters. Also it outperforms the knowledge gradient for a linear model.

[1]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[2]  Loo Hay Lee,et al.  Stochastic Simulation Optimization - An Optimal Computing Budget Allocation , 2010, System Engineering and Operations Research.

[3]  Han Liu,et al.  On the ℓ 1 -ℓ q Regularized Regression , 2008 .

[4]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[5]  Warren B. Powell,et al.  The Knowledge-Gradient Policy for Correlated Normal Beliefs , 2009, INFORMS J. Comput..

[6]  D. Herschlag,et al.  The paradoxical behavior of a highly structured misfolded intermediate in RNA folding. , 2006, Journal of molecular biology.

[7]  Qihang Lin A Sparsity Preserving Stochastic Gradient Method for Composite Optimization , 2011 .

[8]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[9]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[10]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[11]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[12]  Marina Schmid,et al.  Design And Analysis Of Experiments For Statistical Selection Screening And Multiple Comparisons , 2016 .

[13]  A. Tamhane Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons , 1995 .

[14]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[15]  Peter W. Glynn,et al.  A large deviations perspective on ordinal optimization , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..

[16]  Meta M. Voelker,et al.  Variable Selection and Model Building via Likelihood Basis Pursuit , 2004 .

[17]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[18]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[19]  Andreas Krause,et al.  Joint Optimization and Variable Selection of High-dimensional Gaussian Processes , 2012, ICML.

[20]  T. Cech,et al.  In vitro splicing of the ribosomal RNA precursor of tetrahymena: Involvement of a guanosine nucleotide in the excision of the intervening sequence , 1981, Cell.

[21]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[22]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[23]  S. Gupta,et al.  Bayesian look ahead one-stage sampling allocations for selection of the best population , 1996 .

[24]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[25]  G. Wahba,et al.  Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture , 1995 .

[26]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[27]  Warren B. Powell,et al.  A Knowledge-Gradient Policy for Sequential Information Collection , 2008, SIAM J. Control. Optim..

[28]  Warren B. Powell,et al.  The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery , 2011, INFORMS J. Comput..

[29]  Howard Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[30]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[31]  H. Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[32]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[33]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[34]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[35]  C. J. Stone,et al.  The Dimensionality Reduction Principle for Generalized Additive Models , 1986 .

[36]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[37]  Chong Gu Smoothing Spline Anova Models , 2002 .

[38]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[39]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[40]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[41]  Laurent El Ghaoui,et al.  An Homotopy Algorithm for the Lasso with Online Observations , 2008, NIPS.

[42]  D. Solomon,et al.  Applied Statistical Decision Theory. , 1961 .

[43]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[44]  Warren B. Powell,et al.  Optimal learning for sequential sampling with non-parametric beliefs , 2014, J. Glob. Optim..

[45]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[46]  Pierre Alquier,et al.  PAC-Bayesian estimation and prediction in sparse additive models , 2012, Electronic Journal of Statistics.

[47]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[48]  Stephen E. Chick,et al.  New Two-Stage and Sequential Procedures for Selecting the Best Simulated System , 2001, Oper. Res..

[49]  Steven W. Sowa,et al.  Exploiting post-transcriptional regulation to probe RNA structures in vivo via fluorescence , 2014, Nucleic acids research.

[50]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[51]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[52]  Jian Huang,et al.  Consistent group selection in high-dimensional linear regression. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[53]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[54]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[55]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[56]  Xi Chen,et al.  Optimal Regularized Dual Averaging Methods for Stochastic Optimization , 2012, NIPS.

[57]  S. Geer,et al.  Oracle Inequalities and Optimal Inference under Group Sparsity , 2010, 1007.1771.

[58]  Warren B. Powell,et al.  Optimal Learning: Powell/Optimal , 2012 .

[59]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[60]  Warren B. Powell,et al.  Hierarchical Knowledge Gradient for Sequential Sampling , 2011, J. Mach. Learn. Res..

[61]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[62]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.