Measuring the Data Efficiency of Deep Learning Methods

In this paper, we propose a new experimental protocol and use it to benchmark the data efficiency --- performance as a function of training set size --- of two deep learning algorithms, convolutional neural networks (CNNs) and hierarchical information-preserving graph-based slow feature analysis (HiGSFA), for tasks in classification and transfer learning scenarios. The algorithms are trained on different-sized subsets of the MNIST and Omniglot data sets. HiGSFA outperforms standard CNN networks when the models are trained on 50 and 200 samples per class for MNIST classification. In other cases, the CNNs perform better. The results suggest that there are cases where greedy, locally optimal bottom-up learning is equally or more powerful than global gradient-based learning.

[1]  C. Lee Giles,et al.  What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation , 1998 .

[2]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[4]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..

[5]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[9]  David Pfau,et al.  Spectral Inference Networks: Unifying Spectral Methods With Deep Learning , 2018, ArXiv.

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Laurenz Wiskott,et al.  Gradient-based Training of Slow Feature Analysis by Differentiable Approximate Whitening , 2018, ACML.

[12]  J. Popp,et al.  Sample size planning for classification models. , 2012, Analytica chimica acta.

[13]  Laurenz Wiskott,et al.  Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..

[14]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[15]  Claus Nebauer,et al.  Evaluation of convolutional neural networks for visual recognition , 1998, IEEE Trans. Neural Networks.

[16]  Qing Zeng-Treitler,et al.  Predicting sample size required for classification performance , 2012, BMC Medical Informatics and Decision Making.

[17]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  Richard P. Wildes,et al.  What Do We Understand About Convolutional Networks? , 2018, ArXiv.

[20]  Felix Creutzig,et al.  Predictive Coding and the Slowness Principle: An Information-Theoretic Approach , 2008, Neural Computation.

[21]  David Pfau,et al.  Spectral Inference Networks: Unifying Deep and Spectral Learning , 2018, ICLR.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[24]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[27]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[29]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Venu Govindaraju,et al.  Handbook of Statistics, Volume 31: Machine Learning Theory and Applications , 2013 .

[33]  Laurenz Wiskott,et al.  How to solve classification and regression problems on high-dimensional data with a supervised extension of slow feature analysis , 2013, J. Mach. Learn. Res..

[34]  Laurenz Wiskott,et al.  Improved graph-based SFA: information preservation complements the slowness principle , 2016, Machine Learning.

[35]  Lin Sun,et al.  DL-SFA: Deeply-Learned Slow Feature Analysis for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.