The Eighty Five Percent Rule for optimal learning

Researchers and educators have long wrestled with the question of how best to teach their clients be they human, animal or machine. Here we focus on the role of a single variable, the difficulty of training, and examine its effect on the rate of learning. In many situations we find that there is a sweet spot in which training is neither too easy nor too hard, and where learning progresses most quickly. We derive conditions for this sweet spot for a broad class of learning algorithms in the context of binary classification tasks, in which ambiguous stimuli must be sorted into one of two classes. For all of these gradient-descent based learning algorithms we find that the optimal error rate for training is around 15.87% or, conversely, that the optimal training accuracy is about 85%. We demonstrate the efficacy of this ‘Eighty Five Percent Rule’ for artificial neural networks used in AI and biologically plausible neural networks thought to describe human and animal learning.

[1]  Jonathan D. Cohen,et al.  The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function , 2013, Neuron.

[2]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[3]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[4]  Jonathan D. Cohen,et al.  Toward a Rational and Mechanistic Account of Mental Effort. , 2017, Annual review of neuroscience.

[5]  A. Gharabaghi,et al.  Closed-loop adaptation of neurofeedback based on mental effort facilitates reinforcement learning of brain self-regulation , 2016, Clinical Neurophysiology.

[6]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[7]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[8]  M. García-Pérez,et al.  Forced-choice staircases with fixed step sizes: asymptotic and small-sample properties , 1998, Vision Research.

[9]  W. Newsome,et al.  A selective impairment of motion perception following lesions of the middle temporal visual area (MT) , 1988, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[11]  J. Movshon,et al.  The analysis of visual motion: a comparison of neuronal and psychophysical performance , 1992, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[12]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[13]  J. Metcalfe,et al.  A Region of Proximal Learning Model of Study Time Allocation Journal of Memory and Language , 2005 .

[14]  L. Vygotsky,et al.  The collected works of L. S. Vygotsky, Vol. 3: Problems of the theory and history of psychology. , 1997 .

[15]  Veronica X. Yan,et al.  Memory and metamemory considerations in the instruction of human beings revisited: Implications for optimizing online learning. , 2016 .

[16]  K M Newell,et al.  Time scales of adaptive behavior and motor learning in the presence of stochastic perturbations. , 2009, Human movement science.

[17]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[18]  Wolfgang Kinzel,et al.  Improving a Network Generalization Ability by Selecting Examples , 1990 .

[19]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[20]  Joshua W. Brown,et al.  Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex , 2005, Science.

[21]  M. Csíkszentmihályi Beyond boredom and anxiety , 1975 .

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  J. Metcalfe,et al.  The relation between the sense of agency and the experience of flow , 2016, Consciousness and Cognition.

[24]  W. Martin Usrey,et al.  Attention enhances synaptic efficacy and the signal-to-noise ratio in neural circuits , 2013 .

[25]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[26]  J. Wilder The Origins of Intelligence in Children , 1954 .

[27]  J. Metcalfe Learning from Errors , 2017, Annual review of psychology.

[28]  C. Law,et al.  Reinforcement learning can account for associative and perceptual learning on a visual decision task , 2009, Nature Neuroscience.

[29]  Kai A. Krueger,et al.  Flexible shaping: How learning in small steps helps , 2009, Cognition.

[30]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[31]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[32]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[33]  R. Bjork Memory and metamemory considerations in the training of human beings. , 1994 .

[34]  J. Metcalfe,et al.  CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE Metacognitive Judgments and Control of Study , 2022 .

[35]  Richard N. Aslin,et al.  The Goldilocks Effect: Human Infants Allocate Attention to Visual Sequences That Are Neither Too Simple Nor Too Complex , 2012, PloS one.

[36]  W. Schnotz,et al.  A Reconsideration of Cognitive Load Theory , 2007 .

[37]  D. Lawrence,et al.  The transfer of a discrimination along a continuum. , 1952, Journal of comparative and physiological psychology.

[38]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.