论文信息 - REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS

REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS

For an agent living in a nondeterministic Markov environment (NME), what is, in theory, the fastest way of acquiring information about its statistical properties? The answer is: to design “optimal” sequences of “experiments” by performing action sequences that maximize expected information gain. This notion is implemented by combining concepts from information theory and reinforcement learning. Experiments show that the resulting method, reinforcement driven information acquisition, can explore certain NMEs much faster than conventional random exploration.

S. Hochreiter | Jan Storck

[1] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .

[2] C. Watkins. Learning from delayed rewards , 1989 .

[3] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[4] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.

[5] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[6] Jenq-Neng Hwang,et al. Query-based learning applied to partially trained multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[7] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[8] David A. Cohn,et al. Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[9] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[10] Garrison W. Cottrell,et al. Learning Mackey-Glass from 25 Examples, Plus or Minus 2 , 1993, NIPS.