论文信息 - Lifelong Credit Assignment with the Success-Story Algorithm

Lifelong Credit Assignment with the Success-Story Algorithm

Consider an embedded agent with a self-modifying, Turing-equivalent policy that can change only through active self-modifications. How can we make sure that it learns to continually accelerate reward intake? Throughout its life the agent remains ready to undo any self-modification generated during any earlier point of its life, provided the reward per time since then has not increased, thus enforcing a lifelong successstory of self-modifications, each followed by long-term reward acceleration up to the present time. The stack-based method for enforcing this is called the success-story algorithm. It fully takes into account that early self-modifications set the stage for later ones (learning a learning algorithm), and automatically learns to extend self-evaluations until the collected reward statistics are reliable... a very simple but general method waiting to be re-discovered! Time permitting, I will also briefly discuss more recent mathematically optimal universal maximizers of lifelong reward, in particular, the fully self-referential Godel machine.

Jürgen Schmidhuber

[1] Jürgen Schmidhuber,et al. Completely Self-referential Optimal Reinforcement Learners , 2005, ICANN.

[2] Marcus Hutter,et al. Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.

[3] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[4] Juergen Schmidhuber,et al. On learning how to learn learning strategies , 1994 .

[5] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[6] Sebastian Thrun,et al. Learning to Learn , 1998, Springer US.

[7] Jürgen Schmidhuber,et al. Ultimate Cognition à la Gödel , 2009, Cognitive Computation.

[8] Jürgen Schmidhuber,et al. Multi-Agent Learning with the Success-Story Algorithm , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[9] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .

[10] Jürgen Schmidhuber,et al. Gödel Machines: Fully Self-referential Optimal Universal Self-improvers , 2007, Artificial General Intelligence.

[11] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.