A survey of deep network techniques all classifiers can adopt

Deep neural networks (DNNs) have introduced novel and useful tools to the machine learning community. Other types of classifiers can potentially make use of these tools as well to improve their performance and generality. This paper reviews the current state of the art for deep learning classifier technologies that are being used outside of deep neural networks. Non-neural network classifiers can employ many components found in DNN architectures. In this paper, we review the feature learning, optimization, and regularization methods that form a core of deep network technologies. We then survey non-neural network learning algorithms that make innovative use of these methods to improve classification performance. Because many opportunities and challenges still exist, we discuss directions that can be pursued to expand the area of deep learning for a variety of classification algorithms.

[1]  Chi-Chun Lee,et al.  Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[2]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[6]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[7]  J. Friedman Stochastic gradient boosting , 2002 .

[8]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[9]  Andrew M. Stuart,et al.  How Deep Are Deep Gaussian Processes? , 2017, J. Mach. Learn. Res..

[10]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[11]  Jianping Fan,et al.  Embedding Visual Hierarchy With Deep Networks for Large-Scale Visual Recognition , 2017, IEEE Transactions on Image Processing.

[12]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[13]  Cynthia Rudin,et al.  Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[14]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jianlin Cheng,et al.  DNdisorder: predicting protein disorder using boosting and deep networks , 2013, BMC Bioinformatics.

[17]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[18]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[19]  Dong Yu,et al.  Deep Convex Net: A Scalable Architecture for Speech Pattern Classification , 2011, INTERSPEECH.

[20]  Ruo Xu,et al.  Improvements to random forest methodology , 2013 .

[21]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[22]  Jianxin Wu,et al.  Practical Large Scale Classification with Additive Kernels , 2012, ACML.

[23]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[24]  Dong Xu,et al.  Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey , 2018, IEEE Signal Processing Magazine.

[25]  Felice Dell'Orletta,et al.  Tandem LSTM-SVM Approach for Sentiment Analysis , 2016, CLiC-it/EVALITA.

[26]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[27]  Nikolaos V. Sahinidis,et al.  Derivative-free optimization: a review of algorithms and comparison of software implementations , 2013, J. Glob. Optim..

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Yuefei Zhu,et al.  A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks , 2017, IEEE Access.

[30]  Javier Jorge,et al.  Empirical Evaluation of Variational Autoencoders for Data Augmentation , 2018, VISIGRAPP.

[31]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[32]  Yang Zou,et al.  Data Augmentation via Latent Space Interpolation for Image Classification , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[33]  Chris Eliasmith,et al.  Deep networks for robust visual recognition , 2010, ICML.

[34]  Lev V. Utkin,et al.  A Siamese Deep Forest , 2017, Knowl. Based Syst..

[35]  Jani Bizjak,et al.  Classical and deep learning methods for recognizing human activities and modes of transportation with smartphone sensors , 2020, Inf. Fusion.

[36]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[37]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[40]  Sameer Antani,et al.  Deep Learning for Smartphone-Based Malaria Parasite Detection in Thick Blood Smears , 2020, IEEE Journal of Biomedical and Health Informatics.

[41]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[42]  Dong Yu,et al.  Tensor Deep Stacking Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Lambert Schomaker,et al.  Regularization, Optimization, Kernels, and Support Vector Machines , 2014 .

[44]  Jian Yang,et al.  Boosted Convolutional Neural Networks , 2016, BMVC.

[45]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[46]  Xiaohui Peng,et al.  Deep Learning for Sensor-based Activity Recognition: A Survey , 2017, Pattern Recognit. Lett..

[47]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[48]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[49]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[50]  Remco C. Veltkamp,et al.  An Ensemble of Deep Support Vector Machines for Image Categorization , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[51]  Luca Maria Gambardella,et al.  Convolutional Neural Support Vector Machines: Hybrid Visual Pattern Classifiers for Multi-robot Systems , 2012, 2012 11th International Conference on Machine Learning and Applications.

[52]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[53]  Salvatore Orlando,et al.  X-DART: Blending Dropout and Pruning for Efficient Learning to Rank , 2017, SIGIR.

[54]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[55]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[56]  George C. Runger,et al.  Feature selection via regularized trees , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[57]  Choong Seon Hong,et al.  Self-Driving Car Meets Multi-Access Edge Computing for Deep Learning-Based Caching , 2019, 2019 International Conference on Information Networking (ICOIN).

[58]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Kup-Sze Choi,et al.  Deep Additive Least Squares Support Vector Machines for Classification With Model Transfer , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[60]  Wei Yu,et al.  A Survey of Deep Learning: Platforms, Applications and Emerging Research Trends , 2018, IEEE Access.

[61]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[62]  Y. T. Zhou,et al.  Computation of optical flow using a neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[63]  Alexander Sboev,et al.  Deep Learning neural nets versus traditional machine learning in gender identification of authors of RusProfiling texts , 2018 .

[64]  Navdeep Jaitly,et al.  Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[65]  Montek Singh,et al.  Generation of fashionable clothes using generative adversarial networks , 2019, International Journal of Clothing Science and Technology.

[66]  Jian Xu,et al.  Efficient scenario generation of multiple renewable power plants considering spatial and temporal correlations , 2018, Applied Energy.

[67]  Andreas C. Damianou,et al.  Deep Gaussian processes and variational propagation of uncertainty , 2015 .

[68]  Jian Zhang,et al.  Image Super-Resolution Algorithm Based on an Improved Sparse Autoencoder , 2018, Inf..

[69]  Masaki Nakagawa,et al.  Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition , 2001, Pattern Recognit..

[70]  Corentin Dancette,et al.  Sampling strategies in Siamese Networks for unsupervised speech representation learning , 2018, INTERSPEECH.

[71]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Andrey Kazennov,et al.  The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology , 2016, Oncotarget.

[73]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[74]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[75]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[76]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[77]  Margaret Lech,et al.  Evaluating deep learning architectures for Speech Emotion Recognition , 2017, Neural Networks.

[78]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[79]  Cao Feng,et al.  STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS , 1995 .

[80]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[81]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[82]  Carole Lartizien,et al.  Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic resonance imaging: Application to epilepsy lesion screening , 2019, Medical Image Anal..

[83]  Antonio Criminisi,et al.  Adaptive Neural Trees , 2018, ICML.

[84]  James Hays,et al.  SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[85]  Pascal Vincent,et al.  Artificial Neural Networks Applied to Taxi Destination Prediction , 2015, DC@PKDD/ECML.

[86]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[87]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[88]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[89]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[90]  Alex Pappachen James,et al.  A survey on LSTM memristive neural network architectures and applications , 2019, The European Physical Journal Special Topics.

[91]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[92]  Leo Breiman,et al.  Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[93]  Dong Xu,et al.  Deep Kalman Filtering Network for Video Compression Artifact Reduction , 2018, ECCV.

[94]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[95]  Ryan P. Adams,et al.  Avoiding pathologies in very deep networks , 2014, AISTATS.

[96]  Geoffrey E. Hinton,et al.  Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[97]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[98]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[99]  Ming Du,et al.  Computer vision algorithms and hardware implementations: A survey , 2019, Integr..

[100]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Gavin C. Cawley,et al.  Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[102]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[103]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[104]  Neil D. Lawrence,et al.  Recurrent Gaussian Processes , 2015, ICLR.

[105]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[106]  Dong Yu,et al.  Scalable stacking and learning for building deep architectures , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[107]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[108]  Neil D. Lawrence,et al.  Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.

[109]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[110]  Mei-Ling Shyu,et al.  A Survey on Deep Learning , 2018, ACM Comput. Surv..

[111]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[112]  Ran Gilad-Bachrach,et al.  DART: Dropouts meet Multiple Additive Regression Trees , 2015, AISTATS.

[113]  Abien Fred Agarap A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection in Network Traffic Data , 2017, ICMLC.

[114]  Gang Wang,et al.  Face recognition using Deep PCA , 2013, 2013 9th International Conference on Information, Communications & Signal Processing.

[115]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[116]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[117]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[118]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[119]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[120]  Nate Derbinsky,et al.  The Boundary Forest Algorithm for Online Supervised and Unsupervised Learning , 2015, AAAI.

[121]  Carl E. Rasmussen,et al.  Convolutional Gaussian Processes , 2017, NIPS.

[122]  George C. Runger,et al.  Gene selection with guided regularized random forest , 2012, Pattern Recognit..

[123]  Nikolaos G. Paterakis,et al.  Deep learning versus traditional machine learning methods for aggregated energy demand prediction , 2017, 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe).

[124]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[125]  Ji Feng,et al.  AutoEncoder by Forest , 2017, AAAI.

[126]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[127]  Peter Kontschieder,et al.  Deep Neural Decision Forests , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[128]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[129]  Ching Y. Suen,et al.  A novel hybrid CNN-SVM classifier for recognizing handwritten digits , 2012, Pattern Recognit..

[130]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[131]  Yu-Chiang Frank Wang,et al.  Learning Deep Latent Spaces for Multi-Label Classification , 2017, ArXiv.

[132]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[133]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[134]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[135]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[136]  Patrick Gallinari,et al.  An hybrid MLP-SVM handwritten digit recognizer , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[137]  Jie Yang,et al.  Kernelized support vector machine with deep learning: An efficient approach for extreme multiclass dataset , 2017, Pattern Recognit. Lett..

[138]  Dorit S. Hochbaum,et al.  A comparative study of the leading machine learning techniques and two new optimization algorithms , 2019, Eur. J. Oper. Res..