Deep neural networks architecture driven by problem-specific information

Deep learning provides a variety of neural network-based models, known as deep neural networks (DNNs), which are being successfully used in several domains to build highly accurate predictors. A key factor which usually makes DNNs to outperform traditional machine learning models is the amount of data that is nowadays accessible and available. Nevertheless, there are other factors linked to DNNs topologies that may also have influence on the predictive performance of DNN models. In particular, fully connected deep neural networks (fc-DNNs) typically struggle in achieving good performance rates when applied to small datasets. This is due to the high number of parameters which need to be learned when training this kind of models, which makes them prone to over-fitting issues. In this paper, authors propose the use of problem-specific information in order to impose constraints to network architecture so that a fc-DNN is transformed into a partially connected DNN (pc-DNN), in such a way that network topology is driven by prior knowledge. This work compares two baseline models, the elastic net and fc-DNNs, to pc-DNNs applied on three synthetic datasets with different number of samples. Synthetic data was generated to estimate the goodness of using problem-specific information to drive network architectures. Furthermore, a similar analysis is performed herein on a real-world problem dataset to show the benefits of pc-DNN models in term of predictive performance. The results of the analysis showed that pc-DNNs with built-in problem-specific information clearly outperformed the elastic net and fc-DNNs in most of the datasets used, in either synthetic or real-world problems. The pc-DNNs turned out to be a useful model, especially when it is applied to small- or medium-size datasets, on which it significantly outperformed the baseline models considered in this study. Specifically, the pc-DNNs achieved AUC and MSE improvement rates of ( $$8.21\%$$ , $$19.79\%$$ ) and ( $$6.65\%$$ , $$20.54\%$$ ) in small- and medium-size datasets for both case studies analyzed, the synthetic and real-world problem, respectively.

[1]  Ivan Izonin,et al.  SGD-Based Wiener Polynomial Approximation for Missing Data Recovery in Air Pollution Monitoring Dataset , 2019, IWANN.

[2]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[3]  M. Stehlík,et al.  Entropy based statistical inference for methane emissions released from wetland , 2015 .

[4]  Mohsen Rahmani,et al.  A recommender system for tourism industry using cluster ensemble and prediction machine learning techniques , 2017, Comput. Ind. Eng..

[5]  Sung-Bong Jang,et al.  A Comparison of Regularization Techniques in Deep Neural Networks , 2018, Symmetry.

[6]  Guojie Song,et al.  A Deep Spatial-Temporal Ensemble Model for Air Quality Prediction , 2018, Neurocomputing.

[7]  Ricardo Navares,et al.  Predicting air quality with deep learning LSTM: Towards comprehensive models , 2020, Ecol. Informatics.

[8]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  Amos J. Storkey,et al.  Augmenting Image Classifiers Using Data Augmentation Generative Adversarial Networks , 2018, ICANN.

[11]  Chi Wang,et al.  Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures , 2019, BioMed research international.

[12]  Peng Hao,et al.  Transfer learning using computational intelligence: A survey , 2015, Knowl. Based Syst..

[13]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[14]  Ickjai Lee,et al.  Document-level multi-topic sentiment classification of Email data with BiLSTM and data augmentation , 2020, Knowl. Based Syst..

[15]  Virgil Pavlu,et al.  Long-term NO2 exposures and cause-specific mortality in American older adults , 2019, Environment international.

[16]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[17]  Behrouz Minaei,et al.  A survey of regularization strategies for deep models , 2019, Artificial Intelligence Review.

[18]  Jean-Louis Reymond,et al.  Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates , 2020, Nature Communications.

[19]  Ghassan Hamarneh,et al.  Incorporating prior knowledge in medical image segmentation: a survey , 2016, ArXiv.

[20]  François Laviolette,et al.  Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets , 2012, AISTATS.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Bernd Bischl,et al.  mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions , 2017, 1703.03373.

[23]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[24]  Massimiliano Pontil,et al.  Bilevel learning of the Group Lasso structure , 2018, NeurIPS.

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Lefteris Koumakis,et al.  Deep learning models in genomics; are we there yet? , 2020, Computational and structural biotechnology journal.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Fatih Murat Porikli,et al.  Regularization of Deep Neural Networks with Spectral Dropout , 2017, Neural Networks.

[29]  Ashirbani Saha,et al.  Deep learning for segmentation of brain tumors: Impact of cross‐institutional training and testing , 2018, Medical physics.

[30]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[31]  Milan Stehlík,et al.  “SPOCU”: scaled polynomial constant unit activation function , 2020, Neural Computing and Applications.

[32]  Daniel Urda,et al.  Forward Noise Adjustment Scheme for Data Augmentation , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[33]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[34]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[35]  Zedong Nie,et al.  Deep Learning Intervention for Health Care Challenges: Some Biomedical Domain Considerations , 2019, JMIR mHealth and uHealth.

[36]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[37]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[40]  Ying Wu,et al.  Semi-Supervised Transfer Learning for Image Rain Removal , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Milan Stehlík,et al.  Missing chaos in global climate change data interpreting , 2016 .

[42]  Wei Zhang,et al.  Intelligent rotating machinery fault diagnosis based on deep learning using data augmentation , 2018, Journal of Intelligent Manufacturing.

[43]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[44]  Kim Fung Tsang,et al.  An Accurate ECG-Based Transportation Safety Drowsiness Detection Scheme , 2016, IEEE Transactions on Industrial Informatics.

[45]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[46]  Kuang-Yao Lee,et al.  Learning Causal Networks via Additive Faithfulness , 2020, J. Mach. Learn. Res..

[47]  Peter Stone,et al.  Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[48]  Joel J. P. C. Rodrigues,et al.  A novel deep learning based framework for the detection and classification of breast cancer using transfer learning , 2019, Pattern Recognit. Lett..

[49]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[50]  Bohyung Han,et al.  Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[51]  Diane J. Cook,et al.  A survey of deep network techniques all classifiers can adopt , 2020, Data Mining and Knowledge Discovery.

[52]  Guillermo López-García,et al.  Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data , 2020, PloS one.

[53]  I. Annesi-Maesano,et al.  Long-Term Effect of Outdoor Air Pollution on Mortality and Morbidity: A 12-Year Follow-Up Study for Metropolitan France , 2018, International journal of environmental research and public health.

[54]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[55]  Dexian Huang,et al.  Data-driven soft sensor development based on deep learning technique , 2014 .

[56]  Rob J. Hyndman,et al.  A note on the validity of cross-validation for evaluating autoregressive time series prediction , 2018, Comput. Stat. Data Anal..

[57]  M. Gonzalo Claros,et al.  BLASSO: integration of biological knowledge into a regularized linear model , 2018, BMC Systems Biology.

[58]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[59]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[60]  M. Gonzalo Claros,et al.  Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords , 2014, J. Biomed. Informatics.

[61]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[62]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[63]  Lixin Zheng,et al.  A transfer learning method with deep residual network for pediatric pneumonia diagnosis , 2020, Comput. Methods Programs Biomed..

[64]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[65]  Yaser Jararweh,et al.  Transfer Learning for Arabic Named Entity Recognition With Deep Neural Networks , 2020, IEEE Access.

[66]  Sung Young Kim,et al.  Personalized Prediction of Acquired Resistance to EGFR-Targeted Inhibitors Using a Pathway-Based Machine Learning Approach , 2019, Cancers.

[67]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[68]  B. van Ginneken,et al.  Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis , 2016, Scientific Reports.

[69]  Donghyun Lee,et al.  Using Deep Learning Techniques to Forecast Environmental Consumption Level , 2017 .

[70]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[71]  Thomas A. Geddes,et al.  Ensemble deep learning in bioinformatics , 2020, Nature Machine Intelligence.

[72]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[73]  R. Kríz Chaos in Nitrogen Dioxide Concentration Time Series and Its Prediction , 2014 .