Optimization Models for Machine Learning: A Survey

Abstract This paper surveys the machine learning literature and presents in an optimization framework several commonly used machine learning approaches. Particularly, mathematical optimization models are presented for regression, classification, clustering, deep learning, and adversarial learning, as well as new emerging applications in machine teaching, empirical model learning, and Bayesian network structure learning. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. The strengths and the shortcomings of these models are discussed and potential research directions and open problems are highlighted.

[1]  F. Sibel Salman,et al.  A mixed-integer programming approach to the clustering problem with an application in customer segmentation , 2006, Eur. J. Oper. Res..

[2]  Edoardo Amaldi,et al.  A distance-based point-reassignment heuristic for the k-hyperplane clustering problem , 2013, Eur. J. Oper. Res..

[3]  Qiang Ji,et al.  Efficient Structure Learning of Bayesian Networks using Constraints , 2011, J. Mach. Learn. Res..

[4]  Pierre Hansen,et al.  An improved column generation algorithm for minimum sum-of-squares clustering , 2009, Math. Program..

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[7]  Erwin Pesch,et al.  Fast Clustering Algorithms , 1994, INFORMS J. Comput..

[8]  Martin Wistuba,et al.  A Survey on Neural Architecture Search , 2019, ArXiv.

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[11]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[12]  Ryuhei Miyashiro,et al.  Mixed integer second-order cone programming formulations for variable selection in linear regression , 2015, Eur. J. Oper. Res..

[13]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Mohammad Azad,et al.  Minimization of decision tree depth for multi-label decision tables , 2014, 2014 IEEE International Conference on Granular Computing (GrC).

[15]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[16]  Stefan Feuerriegel,et al.  Deep learning in business analytics and operations research: Models, applications and managerial implications , 2018, Eur. J. Oper. Res..

[17]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[18]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[19]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[20]  Gerhard Widmer,et al.  Prediction of Ordinal Classes Using Regression Trees , 2001, Fundam. Informaticae.

[21]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[22]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[23]  Xiaojin Zhu,et al.  Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  Alejandro Toriello,et al.  Fitting piecewise linear continuous functions , 2012, Eur. J. Oper. Res..

[26]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[27]  Yizhen Wang,et al.  Data Poisoning Attacks against Online Learning , 2018, ArXiv.

[28]  Andrea Lodi,et al.  On learning and branching: a survey , 2017 .

[29]  Emilio Carrizosa,et al.  Biobjective sparse principal component analysis , 2014, J. Multivar. Anal..

[30]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[31]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[32]  Sheila A. McIlraith,et al.  Training Binarized Neural Networks Using MIP and CP , 2019, CP.

[33]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[34]  M. Florian,et al.  THE NONLINEAR BILEVEL PROGRAMMING PROBLEM: FORMULATIONS, REGULARITY AND OPTIMALITY CONDITIONS , 1993 .

[35]  Dimitris Bertsimas,et al.  Characterization of the equivalence of robustification and regularization in linear and matrix regression , 2017, Eur. J. Oper. Res..

[36]  Ender Özcan,et al.  A review on the self and dual interactions between machine learning and optimisation , 2019, Progress in Artificial Intelligence.

[37]  Dimitris Bertsimas,et al.  OR Forum - An Algorithmic Approach to Linear Regression , 2016, Oper. Res..

[38]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[39]  Tommi S. Jaakkola,et al.  Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.

[40]  Lin Bai,et al.  Learning More Robust Features with Adversarial Training , 2018, ArXiv.

[41]  Velibor V. Misic,et al.  Optimization of Tree Ensembles , 2017, Oper. Res..

[42]  Shuichi Kawano,et al.  Sparse principal component regression for generalized linear models , 2016, Comput. Stat. Data Anal..

[43]  Yancong Deng,et al.  Few Shot Learning Based on the Street View House Numbers (SVHN) Dataset , 2021 .

[44]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[45]  Le Song,et al.  Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[46]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[47]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[48]  Mohammad Azad,et al.  Minimization of Decision Tree Average Depth for Decision Tables with Many-valued Decisions , 2014, KES.

[49]  Xiaonan Li,et al.  Operations research and data mining , 2008, Eur. J. Oper. Res..

[50]  Dimitris Bertsimas,et al.  From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[51]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[52]  Justo Puerto,et al.  Locating hyperplanes to fitting set of points: A general framework , 2018, Comput. Oper. Res..

[53]  Andrea Lodi,et al.  Learning MILP Resolution Outcomes Before Reaching Time-Limit , 2019, CPAIOR.

[54]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[55]  H. Crowder,et al.  Cluster Analysis: An Application of Lagrangian Relaxation , 1979 .

[56]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[57]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[58]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[59]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[60]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[61]  Loo Hay Lee,et al.  Enhancing transportation systems via deep learning: A survey , 2019, Transportation Research Part C: Emerging Technologies.

[62]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[63]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[64]  Akihiko Konagaya,et al.  Improvements to the cluster Newton method for underdetermined inverse problems , 2015, J. Comput. Appl. Math..

[65]  Uri Shaham,et al.  Understanding adversarial training: Increasing local stability of supervised models through robust optimization , 2015, Neurocomputing.

[66]  Ken Kobayashi,et al.  BEST SUBSET SELECTION FOR ELIMINATING MULTICOLLINEARITY , 2017 .

[67]  P Baldi,et al.  Enhanced Higgs boson to τ(+)τ(-) search with deep learning. , 2014, Physical review letters.

[68]  Daniel Aloise,et al.  A Model for Clustering Data from Heterogeneous Dissimilarities , 2016, Eur. J. Oper. Res..

[69]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[70]  Kristin P. Bennett,et al.  Model selection for primal SVM , 2011, Machine Learning.

[71]  Vince D. Calhoun,et al.  A kernel machine method for detecting higher order interactions in multimodal datasets: Application to schizophrenia , 2018, Journal of Neuroscience Methods.

[72]  Nuno Vasconcelos,et al.  Direct convex relaxations of sparse SVM , 2007, ICML '07.

[73]  Jihun Hamm,et al.  K-Beam Subgradient Descent for Minimax Optimization , 2018, ICML 2018.

[74]  Pierre Bonami,et al.  On mathematical programming with indicator constraints , 2015, Math. Program..

[75]  G. D. H. Claassen,et al.  An application of Special Ordered Sets to a periodic milk collection problem , 2007, Eur. J. Oper. Res..

[76]  L. A. Cox,et al.  Heuristic least-cost computation of discrete classification functions with uncertain argument values , 1990 .

[77]  I. Grossmann Review of Nonlinear Mixed-Integer and Disjunctive Programming Techniques , 2002 .

[78]  Eduardo Sontag,et al.  A Comparison of the Computational Power of Sigmoid and Boolean Threshold Circuits , 1994 .

[79]  Xiaojin Zhu,et al.  Optimal Teaching for Online Perceptrons , 2016 .

[80]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[81]  Gilles Louppe,et al.  Independent consultant , 2013 .

[82]  Tobias Scheffer,et al.  Stackelberg games for adversarial prediction problems , 2011, KDD.

[83]  Jens Lagergren,et al.  Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming , 2014, AISTATS.

[84]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[85]  Sergio García,et al.  A mixed integer linear model for clustering with variable selection , 2014, Comput. Oper. Res..

[86]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[87]  Bradley C. Love,et al.  Optimal Teaching for Limited-Capacity Human Learners , 2014, NIPS.

[88]  Marianthi G. Ierapetritou,et al.  Resolution method for mixed integer bi-level linear problems based on decomposition technique , 2009, J. Glob. Optim..

[89]  Jonathan F. Bard,et al.  An algorithm for the mixed-integer nonlinear bilevel programming problem , 1992, Ann. Oper. Res..

[90]  Dimitris Bertsimas,et al.  Classification and Regression via Integer Optimization , 2007, Oper. Res..

[91]  Bruno Simeone,et al.  Clustering heuristics for set covering , 1993, Ann. Oper. Res..

[92]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[93]  Sandra Zilles,et al.  An Overview of Machine Teaching , 2018, ArXiv.

[94]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[95]  Akihiko Konagaya,et al.  Cluster Newton Method for Sampling Multiple Solutions of Underdetermined Inverse Problems: Application to a Parameter Identification Problem in Pharmacokinetics , 2014, SIAM J. Sci. Comput..

[96]  R. Tibshirani,et al.  Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso , 2017, 1707.08692.

[97]  T. T. Narendran,et al.  CLOVES: A cluster-and-search heuristic to solve the vehicle routing problem with delivery and pick-up , 2007, Eur. J. Oper. Res..

[98]  Kristin P. Bennett,et al.  Decision Tree Construction Via Linear Programming , 1992 .

[99]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[100]  Fabio Roli,et al.  Multiple Classifier Systems under Attack , 2010, MCS.

[101]  Hai Zhao,et al.  A special ordered set approach for optimizing a discontinuous separable piecewise linear function , 2008, Oper. Res. Lett..

[102]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[103]  Dorit S. Hochbaum,et al.  A comparative study of the leading machine learning techniques and two new optimization algorithms , 2019, Eur. J. Oper. Res..

[104]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[105]  Haldun Aytug,et al.  Feature selection for support vector machines using Generalized Benders Decomposition , 2015, Eur. J. Oper. Res..

[106]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[107]  Selwyn Piramuthu Evaluating feature selection methods for learning in data mining applications , 2004, Eur. J. Oper. Res..

[108]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[109]  Milosz Kadzinski,et al.  Robust ordinal regression in preference learning and ranking , 2013, Machine Learning.

[110]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[111]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[112]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[113]  T. Klastorin The p-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach , 1985 .

[114]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[115]  Christodoulos A. Floudas,et al.  Global Optimization of Nonlinear Bilevel Programming Problems , 2001, J. Glob. Optim..

[116]  Junlong Zhang,et al.  A Branch-and-cut Algorithm for Discrete Bilevel Linear Programs , 2017 .

[117]  Christian Tjandraatmadja,et al.  Bounding and Counting Linear Regions of Deep Neural Networks , 2017, ICML.

[118]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[119]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[120]  R. Tibshirani,et al.  REJOINDER TO "LEAST ANGLE REGRESSION" BY EFRON ET AL. , 2004, math/0406474.

[121]  Martine Labbé,et al.  Lagrangian relaxation for SVM feature selection , 2017, Comput. Oper. Res..

[122]  Grant Potter,et al.  ConvNetJS: Deep Learning in your browser , 2017 .

[123]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[124]  S. Dempe,et al.  Bilevel programming with discrete lower level problems , 2009 .

[125]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[126]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[127]  Luca Pulina,et al.  Automated Verification of Neural Networks: Advances, Challenges and Perspectives , 2018, ArXiv.

[128]  O. Mangasarian,et al.  Massive data discrimination via linear support vector machines , 2000 .

[129]  Haytham Elghazel,et al.  A hybrid algorithm for Bayesian network structure learning with application to multi-label learning , 2014, Expert Syst. Appl..

[130]  Laetitia Vermeulen-Jourdan,et al.  Synergies between operations research and data mining: The emerging use of multi-objective approaches , 2012, Eur. J. Oper. Res..

[131]  H. Robbins A Stochastic Approximation Method , 1951 .

[132]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[133]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[134]  Emilio Carrizosa,et al.  Detecting relevant variables and interactions in supervised classification , 2011, Eur. J. Oper. Res..

[135]  James Cussens,et al.  Integer Linear Programming for the Bayesian network structure learning problem , 2017, Artif. Intell..

[136]  Emilio Carrizosa,et al.  Optimal randomized classification trees , 2021, Comput. Oper. Res..

[137]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[138]  J. Paul Brooks,et al.  Principal Component Analysis and Optimization: A Tutorial , 2015 .

[139]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[140]  Abolfazl Keshvari Segmented concave least squares: A nonparametric piecewise linear regression , 2018, Eur. J. Oper. Res..

[141]  Shuichi Kawano,et al.  Sparse principal component regression with adaptive loading , 2014, Comput. Stat. Data Anal..

[142]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[143]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[144]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[145]  TreesKristin P. Bennett,et al.  Optimal Decision Trees , 1996 .

[146]  Jian Yang,et al.  Complete large margin linear discriminant analysis using mathematical programming approach , 2013, Pattern Recognit..

[147]  Ken Kobayashi,et al.  Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor , 2018, Journal of Global Optimization.

[148]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[149]  Mohammad Azad,et al.  Multi-stage optimization of decision and inhibitory trees for decision tables with many-valued decisions , 2017, Eur. J. Oper. Res..

[150]  Tobias Scheffer,et al.  Static prediction games for adversarial learning problems , 2012, J. Mach. Learn. Res..

[151]  Ohad Shamir,et al.  Learning to classify with missing and corrupted features , 2008, ICML '08.

[152]  William S. Meisel,et al.  An Algorithm for Constructing Optimal Binary Decision Trees , 1977, IEEE Transactions on Computers.

[153]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[154]  Achille Fokoue,et al.  An effective algorithm for hyperparameter optimization of neural networks , 2017, IBM J. Res. Dev..

[155]  Sunil Tiwari,et al.  Big data analytics in supply chain management between 2010 and 2016: Insights to industries , 2018, Comput. Ind. Eng..

[156]  Xiaojin Zhu,et al.  Machine Teaching for Bayesian Learners in the Exponential Family , 2013, NIPS.

[157]  Xiaojin Zhu,et al.  The Teaching Dimension of Linear Learners , 2015, ICML.

[158]  Qiang Ji,et al.  Learning Bounded Tree-Width Bayesian Networks via Sampling , 2015, ECSQARU.

[159]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[160]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[161]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[162]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[163]  Changhe Yuan,et al.  Learning Optimal Bayesian Networks: A Shortest Path Perspective , 2013, J. Artif. Intell. Res..

[164]  Yingqian Zhang,et al.  Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization , 2017, CPAIOR.

[165]  Manfred Morari,et al.  A clustering technique for the identification of piecewise affine systems , 2001, Autom..

[166]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[167]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[168]  Richard Weber,et al.  Feature selection for Support Vector Machines via Mixed Integer Linear Programming , 2014, Inf. Sci..

[169]  Marcos Negreiros,et al.  The capacitated centred clustering problem , 2006, Comput. Oper. Res..

[170]  Katya Scheinberg,et al.  Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning , 2017, ArXiv.

[171]  Kenneth J. Berry,et al.  Permutation-based multivariate regression analysis: The case for least sum of absolute deviations regression , 1997, Ann. Oper. Res..

[172]  Martin Wistuba,et al.  Adversarial Robustness Toolbox v1.0.0 , 2018, 1807.01069.

[173]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[174]  Chih-Hong Cheng,et al.  Maximum Resilience of Artificial Neural Networks , 2017, ATVA.

[175]  Vaithilingam Jeyakumar,et al.  Simultaneous classification and feature selection via convex quadratic programming with application to HIV-associated neurocognitive disorder assessment , 2010, Eur. J. Oper. Res..

[176]  Igor Chikalov,et al.  Bi-criteria optimization of decision trees with applications to data analysis , 2018, Eur. J. Oper. Res..

[177]  Chris H. Q. Ding,et al.  Multi-label Linear Discriminant Analysis , 2010, ECCV.

[178]  Diego Klabjan,et al.  Activation Ensembles for Deep Neural Networks , 2017, 2019 IEEE International Conference on Big Data (Big Data).

[179]  Matteo Fischetti,et al.  Deep neural networks and mixed integer linear optimization , 2018, Constraints.

[180]  Neil F. Doherty,et al.  Operational research from Taylorism to Terabytes: A research agenda for the analytics age , 2015, Eur. J. Oper. Res..

[181]  Andrea Lodi,et al.  Optimistic MILP modeling of non-linear optimization problems , 2014, Eur. J. Oper. Res..

[182]  Marco Fraccaro,et al.  Machine learning meets mathematical optimization to predict the optimal production of offshore wind parks , 2018, Comput. Oper. Res..

[183]  Mohammad Azad,et al.  Classification and optimization of decision trees for inconsistent decision tables represented as MVD tables , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[184]  Edoardo Amaldi,et al.  Discrete optimization methods to fit piecewise affine models to data points , 2016, Comput. Oper. Res..

[185]  Pierre Hansen,et al.  Improving heuristics for network modularity maximization using an exact algorithm , 2011, Discret. Appl. Math..

[186]  Marco Zaffalon,et al.  Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables , 2016, NIPS.

[187]  Premysl Sucha,et al.  Accelerating the Branch-and-Price Algorithm Using Machine Learning , 2018, Eur. J. Oper. Res..

[188]  Kristin P. Bennett,et al.  The Interplay of Optimization and Machine Learning Research , 2006, J. Mach. Learn. Res..

[189]  L. N. Vicente,et al.  Discrete linear bilevel programming problem , 1996 .

[190]  Adil M. Bagirov,et al.  A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems , 2006, Eur. J. Oper. Res..

[191]  Tomaso A. Poggio,et al.  Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.

[192]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[193]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[194]  Amnon Shashua,et al.  Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.

[195]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[196]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[197]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[198]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[199]  Nenad Mladenovic,et al.  Variable neighborhood search for minimum sum-of-squares clustering on networks , 2012, Eur. J. Oper. Res..

[200]  Anil K. Jain,et al.  NOTE ON DISTANCE-WEIGHTED k-NEAREST NEIGHBOR RULES. , 1978 .

[201]  Yingqian Zhang,et al.  Learning optimization models in the presence of unknown relations , 2014, ArXiv.

[202]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[203]  George L. Nemhauser,et al.  Mixed-Integer Models for Nonseparable Piecewise-Linear Optimization: Unifying Framework and Extensions , 2010, Oper. Res..

[204]  Andrea Bartolini,et al.  Empirical decision model learning , 2017, Artif. Intell..

[205]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[206]  Thomas Pock,et al.  Continuous Hyper-parameter Learning for Support Vector Machines , 2015 .

[207]  Bart P. G. Van Parys,et al.  Sparse Classification and Phase Transitions: A Discrete Optimization Perspective , 2017 .

[208]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[209]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[210]  Ayumi Shinohara,et al.  Teachability in computational learning , 1990, New Generation Computing.

[211]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[212]  Luiz Antonio Nogueira Lorena,et al.  Clustering search algorithm for the capacitated centered clustering problem , 2010, Comput. Oper. Res..

[213]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[214]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[215]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[216]  Rolf Wendolsky,et al.  A scatter search heuristic for the capacitated clustering problem , 2006, Eur. J. Oper. Res..

[217]  James Cussens,et al.  Advances in Bayesian Network Learning using Integer Programming , 2013, UAI.

[218]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[219]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[220]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[221]  Anita Schöbel,et al.  Locating least-distant lines in the plane , 1998, Eur. J. Oper. Res..

[222]  Ying Daisy Zhuo,et al.  Robust Classification , 2019, INFORMS J. Optim..

[223]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[224]  Christodoulos A. Floudas,et al.  Global optimization of mixed-integer bilevel programming problems , 2005, Comput. Manag. Sci..

[225]  T. Hastie,et al.  Generalized Additive Model Selection , 2015, 1506.03850.

[226]  Emilio Carrizosa,et al.  Supervised classification and mathematical optimization , 2013, Comput. Oper. Res..

[227]  Joe Naoum-Sawaya,et al.  High dimensional data classification and feature selection using support vector machines , 2018, Eur. J. Oper. Res..

[228]  Stephen J. Wright Optimization algorithms for data analysis , 2018, IAS/Park City Mathematics Series.

[229]  Stephan Dempe,et al.  Discrete bilevel programming: Application to a natural gas cash-out problem , 2005, Eur. J. Oper. Res..

[230]  J KriegmanDavid,et al.  Eigenfaces vs. Fisherfaces , 1997 .

[231]  Radu Ioan Bot,et al.  Optimization problems in statistical learning: Duality and optimality conditions , 2011, Eur. J. Oper. Res..

[232]  P. Taylan,et al.  New approaches to regression by generalized additive models and continuous optimization for modern applications in finance, science and technology , 2007 .

[233]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[234]  Alper Atamtürk,et al.  Rank-one Convexification for Sparse Regression , 2019, ArXiv.

[235]  Bistra N. Dilkina,et al.  Combinatorial Attacks on Binarized Neural Networks , 2019, ICLR.

[236]  Emilio Carrizosa,et al.  rs-Sparse principal component analysis: A mixed integer nonlinear programming approach with VNS , 2014, Comput. Oper. Res..

[237]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[238]  Adil M. Bagirov,et al.  New diagonal bundle method for clustering problems in large data sets , 2017, Eur. J. Oper. Res..

[239]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[240]  Rüdiger Ehlers,et al.  Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[241]  H. Zou,et al.  The doubly regularized support vector machine , 2006 .

[242]  José Miguel Díaz-Báñez,et al.  Continuous location of dimensional structures , 2004, Eur. J. Oper. Res..

[243]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[244]  Maw-Sheng Chern,et al.  Nonlinear integer bilevel programming , 1994 .

[245]  Andrea Lodi,et al.  Mathematical programming techniques in water network optimization , 2015, Eur. J. Oper. Res..

[246]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[247]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[248]  Pierre Hansen,et al.  Reformulation of a model for hierarchical divisive graph modularity maximization , 2012, Annals of Operations Research.

[249]  Lorenzo Rosasco,et al.  Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.

[250]  Emilio Carrizosa,et al.  Sparsity in Optimal Randomized Classification Trees , 2020, Eur. J. Oper. Res..

[251]  Antonio Criminisi,et al.  Measuring Neural Net Robustness with Constraints , 2016, NIPS.

[252]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[253]  Oktay Günlük,et al.  Optimal decision trees for categorical data via integer programming , 2021, Journal of Global Optimization.

[254]  Emilio Carrizosa,et al.  Binarized Support Vector Machines , 2010, INFORMS J. Comput..

[255]  Alan J. Miller Subset Selection in Regression , 1992 .

[256]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[257]  Dan Roth,et al.  Constraint Classification for Multiclass Classification and Ranking , 2002, NIPS.

[258]  Michael J. Fry,et al.  Model-based capacitated clustering with posterior regularization , 2018, Eur. J. Oper. Res..

[259]  Jean-Philippe Vial,et al.  Robust Optimization , 2021, ICORES.

[260]  Andrea Lodi,et al.  Learning a Classification of Mixed-Integer Quadratic Programming Problems , 2017, CPAIOR.

[261]  George L. Nemhauser,et al.  Learning to Run Heuristics in Tree Search , 2017, IJCAI.

[262]  Pierre Baldi,et al.  Learning Activation Functions to Improve Deep Neural Networks , 2014, ICLR.

[263]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.

[264]  Mark W. Lewis,et al.  Exact Solutions to the Capacitated Clustering Problem: A Comparison of Two Models , 2014 .

[265]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[266]  Pushmeet Kohli,et al.  A Unified View of Piecewise Linear Neural Network Verification , 2017, NeurIPS.

[267]  Qiang Ji,et al.  Advances in Learning Bayesian Networks of Bounded Treewidth , 2014, NIPS.

[268]  Carl Tim Kelley,et al.  Iterative methods for optimization , 1999, Frontiers in applied mathematics.

[269]  A. Gunasekaran,et al.  Big data analytics in logistics and supply chain management: Certain investigations for research and applications , 2016 .

[270]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[271]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.