Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware

Artificial neural networks (ANNs) have emerged as hot topics in the research community. Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the unprecedented growth in the data volumes. Particularly for microarray data, the very high dimensionality and the small number of samples make it difficult for machine learning techniques to handle. Furthermore, specialized hardware such as graphics processing unit (GPU) is expensive. Sparse neural networks are the leading approaches to address these challenges. However, off-the-shelf sparsity-inducing techniques either operate from a pretrained model or enforce the sparse structure via binary masks. The training efficiency of sparse neural networks cannot be obtained practically. In this paper, we introduce a technique allowing us to train truly sparse neural networks with fixed parameter count throughout training. Our experimental results demonstrate that our method can be applied directly to handle high-dimensional data, while achieving higher accuracy than the traditional two-phase approaches. Moreover, we have been able to create truly sparse multilayer perceptron models with over one million neurons and to train them on a typical laptop without GPU ( https://github.com/dcmocanu/sparse-evolutionary-artificial-neural-networks/tree/master/SET-MLP-Sparse-Python-Data-Structures ), this being way beyond what is possible with any state-of-the-art technique.

[1]  Reza Ghaeini,et al.  A Deep Learning Approach for Cancer Detection and Relevant Gene Identification , 2017, PSB.

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[6]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[7]  Erich Elsen,et al.  Rigging the Lottery: Making All Tickets Winners , 2020, ICML.

[8]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[9]  Rich Caruana,et al.  Benefitting from the Variables that Variable Selection Discards , 2003, J. Mach. Learn. Res..

[10]  Lydia E. Kavraki,et al.  Distributed computation of the knn graph for large high-dimensional point sets , 2007, J. Parallel Distributed Comput..

[11]  José de Jesús Rubio,et al.  SOFMLS: Online Self-Organizing Fuzzy Modified Least-Squares Network , 2009, IEEE Transactions on Fuzzy Systems.

[12]  Jennifer L. Davidson,et al.  Feature selection for steganalysis using the Mahalanobis distance , 2010, Electronic Imaging.

[13]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[14]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[15]  Daniel C. Elton Self-explainability as an alternative to interpretability for judging the trustworthiness of artificial intelligences , 2020, ArXiv.

[16]  Antonio Liotta,et al.  A topological insight into restricted Boltzmann machines , 2016, Machine Learning.

[17]  Yi Li,et al.  Gene expression inference with deep learning , 2015, bioRxiv.

[18]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[19]  Paul Erdös,et al.  On random graphs, I , 1959 .

[20]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[21]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[22]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[23]  Huan Liu,et al.  Advancing feature selection research , 2010 .

[24]  Jianbin Qiu,et al.  Fast learning of neural networks with application to big data processes , 2020, Neurocomputing.

[25]  Francesca Odone,et al.  Feature selection for high-dimensional data , 2009, Comput. Manag. Sci..

[26]  David Kappel,et al.  Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[27]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[28]  S. Strogatz Exploring complex networks , 2001, Nature.

[29]  R. Abseher,et al.  Microarray gene expression profiling of B-cell chronic lymphocytic leukemia subgroups defined by genomic aberrations and VH mutation status. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  Jose Miguel Puerta,et al.  Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking , 2012, Knowl. Based Syst..

[31]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[32]  Oscar Cordón,et al.  Feature-based image registration by means of the CHC evolutionary algorithm , 2006, Image Vis. Comput..

[33]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[34]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[35]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[36]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[37]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[38]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[39]  Luke Zettlemoyer,et al.  Sparse Networks from Scratch: Faster Training without Losing Performance , 2019, ArXiv.

[40]  Mohamed A. Ismail,et al.  Multi-level gene/MiRNA feature selection using deep belief nets and active learning , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[41]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[42]  Manfred Huber,et al.  Using deep learning to enhance cancer diagnosis and classication , 2013 .

[43]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[44]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[45]  Okko Johannes Räsänen,et al.  Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits , 2015, Comput. Speech Lang..

[46]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[47]  Xin Wang,et al.  Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.

[48]  Hossein Nezamabadi-pour,et al.  A hybrid feature selection method for high-dimensional data , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[49]  Decebal Constantin Mocanu,et al.  Network computations in artificial intelligence , 2017 .

[50]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[51]  Decebal Constantin Mocanu,et al.  On improving deep learning generalization with adaptive sparse connectivity , 2019, ArXiv.

[52]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[53]  Niraj K. Jha,et al.  NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm , 2017, IEEE Transactions on Computers.

[54]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[55]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[56]  Mykola Pechenizkiy,et al.  Intrinsically Sparse Long Short-Term Memory Networks , 2019, ArXiv.

[57]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[58]  Santanu Kumar Rath,et al.  Classification of microarray using MapReduce based proximal support vector machine classifier , 2015, Knowl. Based Syst..

[59]  Fang Liu,et al.  Learning Intrinsic Sparse Structures within Long Short-term Memory , 2017, ICLR.

[60]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[61]  Jun Wang,et al.  Supervised Feature Selection by Preserving Class Correlation , 2016, CIKM.

[62]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[63]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[64]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[65]  Peter Stone,et al.  Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[66]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[67]  Yaochu Jin,et al.  Multi-Objective Evolutionary Federated Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[69]  Eric Eaton,et al.  Estimating 3D Trajectories from 2D Projections via Disjunctive Factored Four-Way Conditional Restricted Boltzmann Machines , 2017, Pattern Recognit..

[70]  Josep M. Sopena,et al.  Performing Feature Selection With Multilayer Perceptrons , 2008, IEEE Transactions on Neural Networks.

[71]  Jesús Alberto Meda-Campaña,et al.  On the Estimation and Control of Nonlinear Systems With Parametric Uncertainties and Noisy Outputs , 2018, IEEE Access.

[72]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[73]  S. Shurtleff,et al.  Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the International Microarray Innovations in Leukemia Study Group. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[74]  L. Pessoa Understanding brain networks and brain organization. , 2014, Physics of life reviews.

[75]  José de Jesús Rubio,et al.  USNFIS: Uniform stable neuro fuzzy inference system , 2017, Neurocomputing.

[76]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[77]  Huan Liu,et al.  Advancing Feature Selection Research − ASU Feature Selection Repository , 2010 .

[78]  Thomas L. Griffiths,et al.  Cognitive Model Priors for Predicting Human Decisions , 2019, ICML.