论文信息 - Optimization Models for Machine Learning: A Survey

Optimization Models for Machine Learning: A Survey

Abstract This paper surveys the machine learning literature and presents in an optimization framework several commonly used machine learning approaches. Particularly, mathematical optimization models are presented for regression, classification, clustering, deep learning, and adversarial learning, as well as new emerging applications in machine teaching, empirical model learning, and Bayesian network structure learning. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. The strengths and the shortcomings of these models are discussed and potential research directions and open problems are highlighted.

[1] F. Sibel Salman,et al. A mixed-integer programming approach to the clustering problem with an application in customer segmentation , 2006, Eur. J. Oper. Res..

[2] Edoardo Amaldi,et al. A distance-based point-reassignment heuristic for the k-hyperplane clustering problem , 2013, Eur. J. Oper. Res..

[3] Qiang Ji,et al. Efficient Structure Learning of Bayesian Networks using Constraints , 2011, J. Mach. Learn. Res..

[4] Pierre Hansen,et al. An improved column generation algorithm for minimum sum-of-squares clustering , 2009, Math. Program..

[5] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[6] Pierre Hansen,et al. Cluster analysis and mathematical programming , 1997, Math. Program..

[7] Erwin Pesch,et al. Fast Clustering Algorithms , 1994, INFORMS J. Comput..

[8] Martin Wistuba,et al. A Survey on Neural Architecture Search , 2019, ArXiv.

[9] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[10] Alan Julian Izenman,et al. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[11] Michael Kearns,et al. On the complexity of teaching , 1991, COLT '91.

[12] Ryuhei Miyashiro,et al. Mixed integer second-order cone programming formulations for variable selection in linear regression , 2015, Eur. J. Oper. Res..

[13] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14] Mohammad Azad,et al. Minimization of decision tree depth for multi-label decision tables , 2014, 2014 IEEE International Conference on Granular Computing (GrC).

[15] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .

[16] Stefan Feuerriegel,et al. Deep learning in business analytics and operations research: Models, applications and managerial implications , 2018, Eur. J. Oper. Res..

[17] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.

[18] O. Mangasarian,et al. Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[19] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[20] Gerhard Widmer,et al. Prediction of Ordinal Classes Using Regression Trees , 2001, Fundam. Informaticae.

[21] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[22] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[23] Xiaojin Zhu,et al. Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[24] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25] Alejandro Toriello,et al. Fitting piecewise linear continuous functions , 2012, Eur. J. Oper. Res..

[26] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[27] Yizhen Wang,et al. Data Poisoning Attacks against Online Learning , 2018, ArXiv.

[28] Andrea Lodi,et al. On learning and branching: a survey , 2017 .

[29] Emilio Carrizosa,et al. Biobjective sparse principal component analysis , 2014, J. Multivar. Anal..

[30] Lucila Ohno-Machado,et al. Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[31] Paulo Cortez,et al. Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[32] Sheila A. McIlraith,et al. Training Binarized Neural Networks Using MIP and CP , 2019, CP.

[33] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[34] M. Florian,et al. THE NONLINEAR BILEVEL PROGRAMMING PROBLEM: FORMULATIONS, REGULARITY AND OPTIMALITY CONDITIONS , 1993 .

[35] Dimitris Bertsimas,et al. Characterization of the equivalence of robustification and regularization in linear and matrix regression , 2017, Eur. J. Oper. Res..

[36] Ender Özcan,et al. A review on the self and dual interactions between machine learning and optimisation , 2019, Progress in Artificial Intelligence.

[37] Dimitris Bertsimas,et al. OR Forum - An Algorithmic Approach to Linear Regression , 2016, Oper. Res..

[38] Amir Globerson,et al. Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[39] Tommi S. Jaakkola,et al. Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.

[40] Lin Bai,et al. Learning More Robust Features with Adversarial Training , 2018, ArXiv.

[41] Velibor V. Misic,et al. Optimization of Tree Ensembles , 2017, Oper. Res..

[42] Shuichi Kawano,et al. Sparse principal component regression for generalized linear models , 2016, Comput. Stat. Data Anal..

[43] Yancong Deng,et al. Few Shot Learning Based on the Street View House Numbers (SVHN) Dataset , 2021 .

[44] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[45] Le Song,et al. Learning to Branch in Mixed Integer Programming , 2016, AAAI.

[46] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[47] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..

[48] Mohammad Azad,et al. Minimization of Decision Tree Average Depth for Decision Tables with Many-valued Decisions , 2014, KES.

[49] Xiaonan Li,et al. Operations research and data mining , 2008, Eur. J. Oper. Res..

[50] Dimitris Bertsimas,et al. From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[51] Christopher Meek,et al. Adversarial learning , 2005, KDD '05.

[52] Justo Puerto,et al. Locating hyperplanes to fitting set of points: A general framework , 2018, Comput. Oper. Res..

[53] Andrea Lodi,et al. Learning MILP Resolution Outcomes Before Reaching Time-Limit , 2019, CPAIOR.

[54] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[55] H. Crowder,et al. Cluster Analysis: An Application of Lagrangian Relaxation , 1979 .

[56] Thore Graepel,et al. Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[57] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[58] Yann LeCun,et al. Generalization and network design strategies , 1989 .

[59] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[60] Zoubin Ghahramani,et al. Unifying linear dimensionality reduction , 2014, 1406.0873.

[61] Loo Hay Lee,et al. Enhancing transportation systems via deep learning: A survey , 2019, Transportation Research Part C: Emerging Technologies.

[62] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[63] D. Bertsimas,et al. Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[64] Akihiko Konagaya,et al. Improvements to the cluster Newton method for underdetermined inverse problems , 2015, J. Comput. Appl. Math..

[65] Uri Shaham,et al. Understanding adversarial training: Increasing local stability of supervised models through robust optimization , 2015, Neurocomputing.

[66] Ken Kobayashi,et al. BEST SUBSET SELECTION FOR ELIMINATING MULTICOLLINEARITY , 2017 .

[67] P Baldi,et al. Enhanced Higgs boson to τ(+)τ(-) search with deep learning. , 2014, Physical review letters.

[68] Daniel Aloise,et al. A Model for Clustering Data from Heterogeneous Dissimilarities , 2016, Eur. J. Oper. Res..

[69] Ronald L. Rivest,et al. Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[70] Kristin P. Bennett,et al. Model selection for primal SVM , 2011, Machine Learning.

[71] Vince D. Calhoun,et al. A kernel machine method for detecting higher order interactions in multimodal datasets: Application to schizophrenia , 2018, Journal of Neuroscience Methods.