Intrinsically motivated information foraging

We treat information gathering as a POMDP in which the goal is to maximize an accumulated intrinsic reward at each time step based on the negative entropy of the agent's beliefs about the world state. We show that such information foraging agents can discover intelligent exploration policies that take into account the long-term effects of sensor and motor actions, and can automatically adapt to variations in sensor noise, different amounts of prior information, and limited memory conditions.

[1]  Gregory F Cooper,et al.  A control study to evaluate a computer-based microarray experiment design recommendation system for gene-regulation pathways discovery , 2006, J. Biomed. Informatics.

[2]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[3]  E. Uchibe,et al.  Constrained reinforcement learning from intrinsic and extrinsic rewards , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[4]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[5]  Dana H. Ballard,et al.  Eye Movements for Reward Maximization , 2003, NIPS.

[6]  Richard Dearden,et al.  HiPPo: Hierarchical POMDPs for Planning Information Processing and Sensing Actions on a Robot , 2008, ICAPS.

[7]  J.R. Movellan,et al.  An Infomax Controller for Real Time Detection of Social Contingency , 2005, Proceedings. The 4nd International Conference on Development and Learning, 2005..

[8]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[9]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[10]  Andrew G. Barto,et al.  An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[11]  Javier R. Movellan,et al.  Active Inference in Concept Learning , 2000, NIPS.

[12]  Joachim Denzler,et al.  Optimal Selection of Camera Parameters for State Estimation of Static Systems: An Information Theoretic Approach , 2000 .

[13]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[14]  Yangbo He,et al.  Active Learning of Causal Networks with Intervention Experiments and Optimal Designs , 2008 .

[15]  William D. Smart,et al.  Coupling perception and action using minimax optimal control , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[16]  Jürgen Schmidhuber,et al.  Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .

[17]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[18]  Christian Igel,et al.  A computational efficient covariance matrix update and a (1+1)-CMA for evolution strategies , 2006, GECCO.

[19]  N.J. Butko,et al.  I-POMDP: An infomax model of eye movement , 2008, 2008 7th IEEE International Conference on Development and Learning.

[20]  Mark A. Pitt,et al.  Adaptive Design Optimization: A Mutual Information-Based Approach to Model Discrimination in Cognitive Science , 2010, Neural Computation.

[21]  Javier R. Movellan,et al.  Optimal scanning for faster object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  S. Luttrell The use of transinformation in the design of data sampling schemes for inverse problems , 1985 .

[23]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .