Statistical Aspects of High-Dimensional Sparse Artificial Neural Network Models

An artificial neural network (ANN) is an automatic way of capturing linear and nonlinear correlations, spatial and other structural dependence among features. This machine performs well in many application areas such as classification and prediction from magnetic resonance imaging, spatial data and computer vision tasks. Most commonly used ANNs assume the availability of large training data compared to the dimension of feature vector. However, in modern applications, as mentioned above, the training sample sizes are often low, and may be even lower than the dimension of feature vector. In this paper, we consider a single layer ANN classification model that is suitable for analyzing high-dimensional low sample-size (HDLSS) data. We investigate the theoretical properties of the sparse group lasso regularized neural network and show that under mild conditions, the classification risk converges to the optimal Bayes classifier’s risk (universal consistency). Moreover, we proposed a variation on the regularization term. A few examples in popular research fields are also provided to illustrate the theory and methods.

[1]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[2]  Yu Zhang,et al.  Deep Neural Networks for High Dimension, Low Sample Size Data , 2017, IJCAI.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[5]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[6]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[7]  Sanjoy Dasgupta,et al.  Rates of Convergence for Nearest Neighbor Classification , 2014, NIPS.

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[10]  N. Simon,et al.  Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification , 2017, 1711.07592.

[11]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  P. Radchenko,et al.  Subset Selection with Shrinkage: Sparse Linear Modeling When the SNR Is Low , 2017, Oper. Res..

[14]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[15]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[16]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[17]  M. Kowalski Sparse regression using mixed norms , 2009 .

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[20]  André I. Khuri,et al.  Linear Model Methodology , 2009 .

[21]  Eduardo D. Stonag Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets , 1996, Adv. Comput. Math..

[22]  P. Bickel,et al.  Some theory for Fisher''s linear discriminant function , 2004 .

[23]  Jinchao Xu,et al.  On the Approximation Properties of Neural Networks , 2019, ArXiv.

[24]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[26]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.