论文信息 - Knowing When to Abandon Unproductive Learning

Knowing When to Abandon Unproductive Learning

Knowing When to Abandon Unproductive Learning Thomas R. Shultz (thomas.shultz@mcgill.ca) Department of Psychology and School of Computer Science, McGill University 1205 Penfield Avenue, Montreal QC, Canada H3A 1B1 Eric Doty (eric.doty@mail.mcgill.ca) Department of Psychology, McGill University 1205 Penfield Avenue, Montreal QC, Canada H3A 1B1 Frederic Dandurand (frederic.dandurand@gmail.com) Department of Psychology, Universite de Montreal, 90 ave. Vincent-d'Indy Montreal, QC H2V 2S9 Canada Cessation of learning without mastery is considerably more problematic, despite being an important component of autonomous learning in biological agents. It may be useful to analyze such early quitting in terms of costs and benefits. The total cost of learning can be conceptualized as energy expenditure (of the learning effort) plus opportunity cost (the value of the best alternative not chosen, whether other learning or exploitation of resources): �� = �� + �� . Then the net payoff of learning is the benefit of successful learning minus the total cost of learning: �� = �� − �� . In continuing to work on an unlearnable problem, there would be a large negative payoff, cost without benefit. Having started to learn such a difficult problem, it could be sensible to abandon it when lack of progress becomes evident. Abstract Autonomous learning is the ability to learn effectively without much external assistance, which is a desirable characteristic in both engineering and computational- modeling. We extend a constructive neural-learning algorithm, sibling-descendant cascade-correlation, to monitor lack of progress in learning in order to autonomously abandon unproductive learning. The extended algorithm simulates results of recent experiments with infants who abandon learning on difficult tasks. It also avoids network overtraining effects in a more realistic manner than conventional use of validation test sets. Some contributions and limitations of constructive neural networks for achieving autonomy in learning are briefly assessed. Keywords: autonomous learning; abandoning learning; constructive neural networks; SDCC. Introduction Previous Work on Abandoning Learning Autonomous learning is the ability to learn effectively without much external assistance. As such, autonomy is a desired quality in fields such as machine learning and artificial intelligence where the effectiveness of learning systems is seriously compromised whenever human intervention is required. It is likewise a desired feature in cognitive science where a goal is to understand the adaptive functioning of human and other biological agents in their natural environments. An important characteristic of autonomous learners is that they can shape their own learning and development, in large part by choosing what problems to work on. Such choices include selecting a problem to learn and deciding whether to continue learning on the selected task or abandon it in favor of something else. Knowing When to Quit Knowing when to stop learning has two obvious components – quitting when the problem has been mastered and when it is unlikely to be mastered. In the constructive neural networks that we favor, victory is declared, and learning terminated, when the network is correct on all training examples, in the sense of producing outputs that are within some score-threshold of their target values (Fahlman & Lebiere, 1990; Shultz, 2003). Recent computational modeling does suggest that a key factor in deciding to abandon learning early is whether learning progress is being made (Schmidhuber, 2005, 2010). In that work, learning progress is monitored by tracking the first derivative of error reduction to identify intrinsic rewards, while a reinforcement-learning module selects actions to maximize future intrinsic rewards. These models curiously conflate novelty with learning success, but it seems more correct to base novelty on initial error, and compute learning success as recent progress in error reduction. These models also include a reinforcement- learning controller that selects actions, and an external network to track learning progress. It seems simpler to continue learning by default until lack of progress is detected, perhaps in terms of stagnation in error reduction. In an idealized learning model, infant looking was modeled by information-theoretic properties of stimuli (Kidd, Piantadosi, & Aslin, 2010). The negative log probability of an event (corresponding to the number of bits of information conveyed by a stimulus) was conditioned on observing previous events. The larger the negative log probability, the more surprising the current event. As predicted, 7- to 8-month-old infants were more likely to look away from either highly informative or uninformative events. The authors dubbed this the Goldilocks effect as

[1] LouAnn Gerken,et al. Infants avoid 'labouring in vain' by attending more to learnable than unlearnable linguistic patterns. , 2011, Developmental science.

[2] T. Shultz. Computational Developmental Psychology , 2003 .

[3] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[4] T. Shultz,et al. Why let networks grow , 2007 .

[5] Thomas R. Shultz,et al. Analysis of Unstandardized Contributions in Cross Connected Networks , 1994, NIPS.

[6] Richard N. Aslin,et al. The Goldilocks Effect: Infants' preference for stimuli that are neither too predictable nor too surprising , 2010 .

[7] Jeffrey L. Elman,et al. Analyzing Cross-Connected Networks , 1993, NIPS.

[8] Jürgen Schmidhuber,et al. Self-Motivated Development Through Rewards for Predictor Errors / Improvements , 2005, AAAI 2005.

[9] Shumeet Baluja,et al. Reducing Network Depth in the Cascade-Correlation Learning Architecture, , 1994 .

[10] R. Catrambone,et al. Proceedings of the 32nd Annual Conference of the Cognitive Science Society , 2010 .

[11] Thomas R. Shultz,et al. Could Knowledge-Based Neural Learning be Useful in Developmental Robotics? The Case of Kbcc , 2007, Int. J. Humanoid Robotics.

[12] D. Stephens,et al. Components of change in the evolution of learning and unlearned preference , 2009, Proceedings of the Royal Society B: Biological Sciences.

[13] Thomas R. Shultz,et al. Knowledge-based cascade-correlation: Using knowledge to speed learning , 2001, Connect. Sci..