Model-based feature selection for neural networks: A mixed-integer programming approach

In this work, we develop a novel input feature selection framework for ReLU-based deep neural networks (DNNs), which builds upon a mixed-integer optimization approach. While the method is generally applicable to various classification tasks, we focus on finding input features for image classification for clarity of presentation. The idea is to use a trained DNN, or an ensemble of trained DNNs, to identify the salient input features. The input feature selection is formulated as a sequence of mixed-integer linear programming (MILP) problems that find sets of sparse inputs that maximize the classification confidence of each category. These ''inverse'' problems are regularized by the number of inputs selected for each category and by distribution constraints. Numerical results on the well-known MNIST and FashionMNIST datasets show that the proposed input feature selection allows us to drastically reduce the size of the input to $\sim$15\% while maintaining a good classification accuracy. This allows us to design DNNs with significantly fewer connections, reducing computational effort and producing DNNs that are more robust towards adversarial attacks.

[1]  G. Perakis,et al.  Optimizing Objective Functions from Trained ReLU Neural Networks via Sampling , 2022, ArXiv.

[2]  S. Ramalingam,et al.  The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks , 2022, ICML.

[3]  R. Misener,et al.  P-split formulations: A class of intermediate formulations between big-M and convex hull for disjunctive constraints , 2022, ArXiv.

[4]  R. Misener,et al.  OMLT: Optimization & Machine Learning Toolkit , 2022, J. Mach. Learn. Res..

[5]  Sven Leyffer,et al.  Modeling design and control problems involving neural network surrogates , 2021, Computational Optimization and Applications.

[6]  Calvin Tsay,et al.  Partition-based formulations for mixed-integer optimization of trained ReLU neural networks , 2021, NeurIPS.

[7]  Calvin Tsay,et al.  Between steps: Intermediate relaxations between big-M and convex hull formulations , 2021, CPAIOR.

[8]  Philip H. S. Torr,et al.  Scaling the Convex Barrier with Sparse Dual Algorithms , 2021, ArXiv.

[9]  Diyar Qader Zeebaree,et al.  A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction , 2020, Journal of Applied Science and Technology Trends.

[10]  Panagiotis Kouvaros,et al.  Efficient Verification of ReLU-Based Neural Networks via Dependency Analysis , 2020, AAAI.

[11]  S. Ramalingam,et al.  Lossless Compression of Deep Neural Networks , 2020, CPAIOR.

[12]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[13]  Noah A. Smith,et al.  Green AI , 2019, 1907.10597.

[14]  Bjarne Grimstad,et al.  ReLU Networks as Surrogate Models in Mixed-Integer Linear Programs , 2019, Comput. Chem. Eng..

[15]  Fakhri Karray,et al.  Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review , 2019, ArXiv.

[16]  S. Valaee,et al.  Survey of Dropout Methods for Deep Neural Networks , 2019, ArXiv.

[17]  Christian Tjandraatmadja,et al.  Strong mixed-integer programming formulations for trained neural networks , 2018, Mathematical Programming.

[18]  Jason Jo,et al.  Deep neural networks and mixed integer linear optimization , 2018, Constraints.

[19]  Jari M. Böling,et al.  Structural learning in artificial neural networks using sparse optimization , 2018, Neurocomputing.

[20]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[21]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[22]  Alessio Lomuscio,et al.  An approach to reachability analysis for feed-forward ReLU neural networks , 2017, ArXiv.

[23]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[24]  Diederik P. Kingma,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[25]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[27]  Juan Pablo Vielma,et al.  Mixed Integer Linear Programming Formulation Techniques , 2015, SIAM Rev..

[28]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[29]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[30]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[31]  Geoffrey E. Hinton,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[32]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[33]  Carlos Cardonha,et al.  Acceleration techniques for optimization over trained neural network ensembles , 2021, ArXiv.

[34]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[35]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.