HQ-Learning
暂无分享,去创建一个
[1] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[2] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .
[3] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.
[4] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[5] Jürgen Schmidhuber,et al. Learning to generate sub-goals for action sequences , 1991 .
[6] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[7] Satinder P. Singh,et al. The Efficient Learning of Multiple Task Sequences , 1991, NIPS.
[8] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .
[9] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[10] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[11] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[12] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[13] Satinder Singh. The Ecient Learning of Multiple Task Sequences , 1992 .
[14] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[15] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .
[16] David A. Cohn,et al. Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.
[17] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[18] Astro Teller,et al. The evolution of mental models , 1994 .
[19] Dana Ron,et al. Learning probabilistic automata with variable memory length , 1994, COLT '94.
[20] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[21] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[22] Dave Cliff,et al. Adding Temporary Memory to ZCS , 1994, Adapt. Behav..
[23] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.
[24] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[25] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .
[26] Chen K. Tham,et al. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..
[27] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[28] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[29] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[30] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.
[31] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[32] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.
[33] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .
[34] Explore/Exploit Strategies in Autonomy , 1996 .
[35] Wenju Liu,et al. Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .
[36] Bruce L. Digney. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcem , 1996 .
[37] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[38] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.
[39] Maja J. Matarić,et al. Action Selection methods using Reinforcement Learning , 1996 .
[40] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[41] Juergen Schmidhuber,et al. Incremental self-improvement for life-time multi-agent reinforcement learning , 1996 .
[42] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[43] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[44] J. Schmidhuber. What''s interesting? , 1997 .
[45] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.
[46] Jürgen Schmidhuber,et al. Probabilistic Incremental Program Evolution: Stochastic Search Through Program Space , 1997, ECML.
[47] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[48] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.
[49] Reid G. Simmons,et al. The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement-Learning Algorithms , 2005, Machine Learning.