Consistent feature selection for neural networks via Adaptive Group Lasso

One main obstacle for the wide use of deep learning in medical and engineering sciences is its interpretability. While neural network models are strong tools for making predictions, they often provide little information about which features play significant roles in influencing the prediction accuracy. To overcome this issue, many regularization procedures for learning with neural networks have been proposed for dropping non-significant features. Unfortunately, the lack of theoretical results casts doubt on the applicability of such pipelines. In this work, we propose and establish a theoretical guarantee for the use of the adaptive group lasso for selecting important features of neural networks. Specifically, we show that our feature selection method is consistent for single-output feed-forward neural networks with one hidden layer and hyperbolic tangent activation function. We demonstrate its applicability using both simulation and data analysis.

[1]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[2]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[3]  Danilo Comminiello,et al.  Group sparse regularization for deep neural networks , 2016, Neurocomputing.

[4]  Anru Zhang,et al.  On the non‐asymptotic and sharp lower tail bounds of random variables , 2018, Stat.

[5]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[6]  Paul C. Kainen,et al.  Functionally Equivalent Feedforward Neural Networks , 1994, Neural Computation.

[7]  Vu C. Dinh,et al.  Nonbifurcating Phylogenetic Tree Inference via the Adaptive LASSO , 2018, Journal of the American Statistical Association.

[8]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[9]  William Stafford Noble,et al.  DeepPINK: reproducible feature selection in deep neural networks , 2018, NeurIPS.

[10]  Kenji Fukumizu,et al.  A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network , 1996, Neural Networks.

[11]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[12]  Qinghua Hu,et al.  Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO , 2015, IEEE Transactions on Multimedia.

[13]  Kai Yang,et al.  SAFS: A deep feature selection approach for precision medicine , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[14]  T. Martin McGinnity,et al.  Deep-FS: A feature selection algorithm for Deep Boltzmann Machines , 2018, Neurocomputing.

[15]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16]  Mohamed A. Ismail,et al.  Multi-level gene/MiRNA feature selection using deep belief nets and active learning , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[17]  Kemal Adem,et al.  DIVORCE PREDICTION USING CORRELATION BASED FEATURE SELECTION AND ARTIFICIAL NEURAL NETWORKS , 2019 .

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[20]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.