Optimization for Machine Learning

The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields.Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.

[1]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[2]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[3]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[4]  A. A. Goldstein,et al.  Newton's method for convex programming and Tchebycheff approximation , 1959, Numerische Mathematik.

[5]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[6]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[7]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[8]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[9]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[10]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[11]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[12]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[13]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[14]  M. Powell A method for nonlinear constraints in minimization problems , 1969 .

[15]  M. Hestenes Multiplier and gradient methods , 1969 .

[16]  V. Klee,et al.  HOW GOOD IS THE SIMPLEX ALGORITHM , 1970 .

[17]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[18]  D. Bertsekas,et al.  Combined Primal–Dual and Penalty Methods for Convex Programming , 1976 .

[19]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[20]  R. Tyrrell Rockafellar,et al.  Augmented Lagrangians and Applications of the Proximal Point Algorithm in Convex Programming , 1976, Math. Oper. Res..

[21]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[22]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[23]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[24]  Martin Grötschel,et al.  The ellipsoid method and its consequences in combinatorial optimization , 1981, Comb..

[25]  R. Dembo,et al.  INEXACT NEWTON METHODS , 1982 .

[26]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[27]  Krzysztof C. Kiwiel,et al.  An aggregate subgradient method for nonsmooth convex minimization , 1983, Math. Program..

[28]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[29]  P. Brucker Review of recent development: An O( n) algorithm for quadratic knapsack problems , 1984 .

[30]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[31]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[32]  Alan K. Mackworth Constraint Satisfaction , 1985 .

[33]  Michael A. Saunders,et al.  On projected newton barrier methods for linear programming and an equivalence to Karmarkar’s projective method , 1986, Math. Program..

[34]  R. Glowinski,et al.  Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics , 1987 .

[35]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[36]  James Renegar,et al.  A polynomial-time algorithm, based on Newton's method, for linear programming , 1988, Math. Program..

[37]  Shih-Ping Han A parallel algorithm for a class of convex programs , 1988 .

[38]  Geraldo Galdino de Paula,et al.  A linear-time median-finding algorithm for projecting a vector on the simplex of Rn , 1989 .

[39]  I. Lustig,et al.  Interior Point Methods for Linear Programming: Just Call Newton, Lagrange, and Fiacco and McCormick! , 1990 .

[40]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[41]  L D Cromwell,et al.  Filtering noise from images with wavelet transforms , 1991, Magnetic resonance in medicine.

[42]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[43]  P. Tseng Applications of splitting algorithm to decomposition in convex programming and variational inequalities , 1991 .

[44]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[45]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[46]  Donald Goldfarb,et al.  Steepest-edge simplex algorithms for linear programming , 1992, Math. Program..

[47]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[48]  Masao Fukushima,et al.  Primal-dual proximal point algorithm for linearly constrained convex programming problems , 1992, Comput. Optim. Appl..

[49]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[50]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[51]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[52]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[53]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[54]  Yurii Nesterov,et al.  New variants of bundle methods , 1995, Math. Program..

[55]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[56]  Peter Kall,et al.  Stochastic Programming , 1995 .

[57]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[58]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[59]  T. Cover Universal Portfolios , 1996 .

[60]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[61]  J. Doyle,et al.  Robust and optimal control , 1995, Proceedings of 35th IEEE Conference on Decision and Control.

[62]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[63]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[64]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[65]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[66]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[67]  Laurent El Ghaoui,et al.  Robust Solutions to Least-Squares Problems with Uncertain Data , 1997, SIAM J. Matrix Anal. Appl..

[68]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[69]  Arie M. C. A. Koster,et al.  The partial constraint satisfaction problem: Facets and lifting theorems , 1998, Oper. Res. Lett..

[70]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[71]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[72]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[73]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[74]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[75]  S. Bellavia Inexact Interior-Point Method , 1998 .

[76]  L. Deecke,et al.  Neuroimage of Voluntary Movement: Topography of the Bereitschaftspotential, a 64-Channel DC Current Source Density Study , 1999, NeuroImage.

[77]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[78]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[79]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[80]  Guy Le Besnerais,et al.  A new look at entropy for solving linear inverse problems , 1999, IEEE Trans. Inf. Theory.

[81]  Arkadi Nemirovski,et al.  Robust solutions of Linear Programming problems contaminated with uncertain data , 2000, Math. Program..

[82]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[83]  Richard M. Karp,et al.  An Optimal Algorithm for Monte Carlo Estimation , 2000, SIAM J. Comput..

[84]  Leslie Pack Kaelbling,et al.  Sampling Methods for Action Selection in Influence Diagrams , 2000, AAAI/IAAI.

[85]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[86]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[87]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[88]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[89]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[90]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[91]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[92]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[93]  Klaus-Robert Müller,et al.  Classifying Single Trial EEG: Towards Brain Computer Interfacing , 2001, NIPS.

[94]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[95]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[96]  I. Maros Computational Techniques of the Simplex Method , 2002 .

[97]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[98]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[99]  Robert D. Nowak,et al.  An EM algorithm for wavelet-based image restoration , 2003, IEEE Trans. Image Process..

[100]  Stephen J. Wright,et al.  Object-oriented software for quadratic programming , 2003, TOMS.

[101]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[102]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[103]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[104]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[105]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[106]  Manfred K. Warmuth,et al.  Path kernels and multiplicative updates , 2003 .

[107]  T. Poggio,et al.  General conditions for predictivity in learning theory , 2004, Nature.

[108]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[109]  Michael I. Jordan,et al.  Robust Sparse Hyperplane Classifiers: Application to Uncertain Molecular Profiling Data , 2004, J. Comput. Biol..

[110]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[111]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[112]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[113]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[114]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[115]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[116]  Osamu Watanabe,et al.  Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Data Mining and Knowledge Discovery.

[117]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[118]  Melvyn Sim,et al.  The Price of Robustness , 2004, Oper. Res..

[119]  Klaus-Robert Müller,et al.  The BCI competition 2003: progress and perspectives in detection and discrimination of EEG single trials , 2004, IEEE Transactions on Biomedical Engineering.

[120]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[121]  Katya Scheinberg,et al.  A product-form Cholesky factorization method for handling dense columns in interior point methods for linear programming , 2004, Math. Program..

[122]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[123]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[124]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[125]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[126]  Stephen P. Boyd,et al.  Robust Fisher Discriminant Analysis , 2005, NIPS.

[127]  Jacek Gondzio,et al.  Direct Solution of Linear Systems of Size 109 Arising in Optimization with Interior Point Methods , 2005, PPAM.

[128]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[129]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[130]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[131]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[132]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[133]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[134]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[135]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[136]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[137]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[138]  K. I. M. McKinnon,et al.  Hyper-Sparsity in the Revised Simplex Method and How to Exploit it , 2005, Comput. Optim. Appl..

[139]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[140]  G. Dullerud,et al.  A Course in Robust Control Theory: A Convex Approach , 2005 .

[141]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[142]  Toby Walsh,et al.  Handbook of Constraint Programming , 2006, Handbook of Constraint Programming.

[143]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[144]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[145]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[146]  Thomas P. Hayes,et al.  Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.

[147]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[148]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[149]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[150]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[151]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[152]  Alexander J. Smola,et al.  Second Order Cone Programming Approaches for Handling Missing and Uncertain Data , 2006, J. Mach. Learn. Res..

[153]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[154]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[155]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[156]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[157]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[158]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[159]  Jianfeng Gao,et al.  A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing , 2007, ACL.

[160]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[161]  Robert D. Nowak,et al.  Majorization–Minimization Algorithms for Wavelet-Based Image Restoration , 2007, IEEE Transactions on Image Processing.

[162]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[163]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[164]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[165]  M. Yuan,et al.  Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .

[166]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[167]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[168]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[169]  Tommi S. Jaakkola,et al.  New Outer Bounds on the Marginal Polytope , 2007, NIPS.

[170]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[171]  Pierre Morizet-Mahoudeaux,et al.  Hierarchical Penalization , 2007, NIPS.

[172]  Kazuyuki Aihara,et al.  Classifying matrices with a spectral regularization , 2007, ICML '07.

[173]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[174]  David Sontag,et al.  Cutting plane algorithms for variational inference in graphical models , 2007 .

[175]  Dmitry M. Malioutov,et al.  Lagrangian Relaxation for MAP Estimation in Graphical Models , 2007, ArXiv.

[176]  Alexander J. Smola,et al.  Bundle Methods for Machine Learning , 2007, NIPS.

[177]  Theodore B. Trafalis,et al.  Robust support vector machines for classification and computational issues , 2007, Optim. Methods Softw..

[178]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[179]  Alexander J. Smola,et al.  A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[180]  J.-C. Pesquet,et al.  A Douglas–Rachford Splitting Approach to Nonsmooth Convex Variational Signal Recovery , 2007, IEEE Journal of Selected Topics in Signal Processing.

[181]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[182]  David P. Wipf,et al.  A New View of Automatic Relevance Determination , 2007, NIPS.

[183]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[184]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[185]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[186]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[187]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[188]  Tomás Werner,et al.  A Linear Programming Approach to Max-Sum Problem: A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[189]  Nikos Komodakis,et al.  Beyond Loose LP-Relaxations: Optimizing MRFs by Repairing Cycles , 2008, ECCV.

[190]  Philip H. S. Torr,et al.  Efficiently solving convex relaxations for MAP estimation , 2008, ICML '08.

[191]  Sören Sonnenburg,et al.  Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.

[192]  A. Nemirovski,et al.  Interior-point methods for optimization , 2008, Acta Numerica.

[193]  Francis R. Bach,et al.  Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[194]  Wotao Yin,et al.  Bregman Iterative Algorithms for (cid:2) 1 -Minimization with Applications to Compressed Sensing ∗ , 2008 .

[195]  Tomás Werner,et al.  High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[196]  D. Goldfarb,et al.  Numerically stable LDLT factorizations in interior point methods for convex quadratic programming , 2008 .

[197]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[198]  Tommi S. Jaakkola,et al.  Tightening LP Relaxations for MAP using Message Passing , 2008, UAI.

[199]  Jacek Gondzio,et al.  Further development of multiple centrality correctors for interior point methods , 2008, Comput. Optim. Appl..

[200]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[201]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[202]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML.

[203]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[204]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[205]  Shie Mannor,et al.  Learning in the Limit with Adversarial Disturbances , 2008, COLT.

[206]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[207]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[208]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[209]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[210]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[211]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[212]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[213]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[214]  K. R. Ramakrishnan,et al.  On the Algorithmics and Applications of a Mixed-norm based Kernel Learning Formulation , 2009, NIPS.

[215]  Sören Sonnenburg,et al.  Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization , 2009, J. Mach. Learn. Res..

[216]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[217]  Massimiliano Pontil,et al.  Taking Advantage of Sparsity in Multi-Task Learning , 2009, COLT.

[218]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[219]  Masashi Sugiyama,et al.  Dual-Augmented Lagrangian Method for Efficient Sparse Reconstruction , 2009, IEEE Signal Processing Letters.

[220]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[221]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[222]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[223]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[224]  Xinhua Zhang,et al.  Lower Bounds for BMRM and Faster Rates for Training SVMs , 2009, ArXiv.

[225]  Igor Griva,et al.  Interior-Point Methods for Linear Programming , 2009 .

[226]  Jacek Gondzio,et al.  Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training , 2009 .

[227]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[228]  Jacob D. Abernethy,et al.  Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[229]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[230]  Elad Hazan,et al.  On Stochastic and Worst-case Models for Investing , 2009, NIPS.

[231]  Zenglin Xu,et al.  Non-monotonic feature selection , 2009, ICML '09.

[232]  Chong Wang,et al.  Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process , 2009, NIPS.

[233]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[234]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[235]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[236]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[237]  Junzhou Huang,et al.  The Benefit of Group Sparsity , 2009 .

[238]  Lorenzo Rosasco,et al.  Iterative Projection Methods for Structured Sparsity Regularization , 2009 .

[239]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[240]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[241]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[242]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[243]  Yung C. Shin,et al.  Sparse Multiple Kernel Learning for Signal Processing Applications , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[244]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[245]  William J. Cook,et al.  Solution of a Large-Scale Traveling-Salesman Problem , 1954, 50 Years of Integer Programming.

[246]  S. V. N. Vishwanathan,et al.  Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[247]  Masashi Sugiyama,et al.  A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices , 2010, ICML.

[248]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[249]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[250]  Mark W. Schmidt,et al.  Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials , 2010, AISTATS.

[251]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[252]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[253]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[254]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[255]  Yonina C. Eldar,et al.  Collaborative hierarchical sparse modeling , 2010, 2010 44th Annual Conference on Information Sciences and Systems (CISS).

[256]  Jean-Marc Odobez,et al.  A Sparsity Constraint for Topic Models - Application to Temporal Activity Mining , 2010, NIPS 2010.

[257]  Simon Setzer,et al.  Operator Splittings, Bregman Methods and Frame Shrinkage in Image Processing , 2011, International Journal of Computer Vision.

[258]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[259]  Klaus-Robert Müller,et al.  A regularized discriminative framework for EEG analysis with application to brain–computer interface , 2010, NeuroImage.

[260]  Sébastien Bubeck Bandits Games and Clustering Foundations , 2010 .

[261]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[262]  Jean-Yves Audibert PAC-Bayesian aggregation and multi-armed bandits , 2010 .

[263]  Julien Mairal,et al.  Network Flow Algorithms for Structured Sparsity , 2010, NIPS.

[264]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[265]  Francis R. Bach,et al.  Structured Sparse Principal Component Analysis , 2009, AISTATS.

[266]  Tomás Werner,et al.  Revisiting the Linear Programming Relaxation Approach to Gibbs Energy Minimization and Weighted Constraint Satisfaction , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[267]  Jean-Marc Odobez,et al.  Probabilistic Latent Sequential Motifs: Discovering Temporal Activity Patterns in Video Scenes , 2010, BMVC.

[268]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[269]  Masashi Sugiyama,et al.  Augmented Lagrangian Methods for Learning, Selecting, and Combining Features , 2011 .

[270]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[271]  Elad Hazan,et al.  Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[272]  Constantine Caramanis,et al.  Theory and Applications of Robust Optimization , 2010, SIAM Rev..

[273]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[274]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[275]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[276]  Jacek Gondzio,et al.  Exploiting separability in large-scale linear support vector machine training , 2011, Comput. Optim. Appl..

[277]  Masashi Sugiyama,et al.  Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation , 2009, J. Mach. Learn. Res..

[278]  Xinhua Zhang,et al.  Smoothing multivariate performance measures , 2011, J. Mach. Learn. Res..

[279]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[280]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.

[281]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[282]  Jacek Gondzio,et al.  Matrix-free interior point method , 2012, Comput. Optim. Appl..

[283]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[284]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[285]  Rong Jin,et al.  Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.

[286]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[287]  Justin Domke,et al.  Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[288]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[289]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[290]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[291]  P. Pardalos,et al.  Recent Applications , 2021 .

[292]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .