Deep Neural Networks for High Dimension, Low Sample Size Data

Deep neural networks (DNN) have achieved breakthroughs in applications with large sample size. However, when facing high dimension, low sample size (HDLSS) data, such as the phenotype prediction problem using genetic data in bioinformatics, DNN suffers from overfitting and high-variance gradients. In this paper, we propose a DNN model tailored for the HDLSS data, named Deep Neural Pursuit (DNP). DNP selects a subset of high dimensional features for the alleviation of overfitting and takes the average over multiple dropouts to calculate gradients with low variance. As the first DNN method applied on the HDLSS data, DNP enjoys the advantages of the high nonlinearity, the robustness to high dimensionality, the capability of learning from a small number of samples, the stability in feature selection, and the end-to-end training. We demonstrate these advantages of DNP via empirical results on both synthetic and real-world biological datasets.

[1]  Brendan J. Frey,et al.  Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets , 2016, Proceedings of the IEEE.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[4]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[7]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[8]  Kilian Q. Weinberger,et al.  Gradient boosted feature selection , 2014, KDD.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Yiming Yang,et al.  From Lasso regression to Feature vector machine , 2005, NIPS.

[12]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[13]  Nicolas Le Roux,et al.  Convex Neural Networks , 2005, NIPS.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  M. Nivat Theoretical Computer Science Volume 213-214 , 1999 .

[16]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[17]  Dinh Phung,et al.  Journal of Machine Learning Research: Preface , 2014 .

[18]  Jure Leskovec,et al.  Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining , 2014, KDD 2014.

[19]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[20]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[21]  Pradeep Ravikumar,et al.  Greedy Algorithms for Structurally Constrained High Dimensional Problems , 2011, NIPS.

[22]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[23]  Avishek Saha,et al.  Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[24]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[27]  Wyeth W. Wasserman,et al.  Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters , 2015, RECOMB.

[28]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[29]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[30]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[31]  Masashi Sugiyama,et al.  High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso , 2012, Neural Computation.

[32]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[33]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[34]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[35]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[36]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[37]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[38]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.