Sequential Decision Making Based on Direct Search
暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.
[2] Douglas B. Lenat,et al. Theory Formation by Heuristic Search , 1983, Artificial Intelligence.
[3] Hans-Paul Schwefel,et al. Evolution and Optimum Seeking: The Sixth Generation , 1993 .
[4] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[5] Hans-Paul Schwefel,et al. Evolution and optimum seeking , 1995, Sixth-generation computer technology series.
[6] Andrew W. Moore,et al. Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.
[7] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[8] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.
[9] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[10] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.
[11] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.
[12] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.
[13] Frank Kirchner. Q-learning of complex behaviours on a six-legged walking machine , 1998, Robotics Auton. Syst..
[14] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[15] Garrison W. Cottrell,et al. Learning Mackey-Glass from 25 Examples, Plus or Minus 2 , 1993, NIPS.
[16] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[17] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[18] Martin Wattenberg,et al. Stochastic Hillclimbing as a Baseline Mathod for Evaluating Genetic Algorithms , 1995, NIPS.
[19] W. Vent,et al. Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .
[20] Peter Dayan,et al. Exploration bonuses and dual control , 1996 .
[21] Leonid A. Levin,et al. Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..
[22] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .
[23] Mark Humphrys,et al. Action Selection methods using Reinforcement Learning , 1996 .
[24] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[25] Jürgen Schmidhuber,et al. Discovering Predictable Classifications , 1993, Neural Computation.
[26] Gregory J. Chaitin,et al. On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.
[27] Kagan Tumer,et al. Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.
[28] Sandip Sen,et al. Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.
[29] William I. Gasarch,et al. Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.
[30] Igor Durdanovic,et al. Toward Code Evolution by Artificial Economies , 2002 .
[31] M. Veloso,et al. Bounding the suboptimality of reusing subproblems , 1999, IJCAI 1999.
[32] Gerhard We. Hierarchical Chunking in Classifier Systems , 1994 .
[33] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .
[34] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[35] Gerhard Weiß,et al. Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.
[36] Peter Nordin,et al. Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .
[37] Gerhard Weiß,et al. Hierarchical Chunking in Classifier Systems , 1994, AAAI.
[38] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[39] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[40] Chen K. Tham,et al. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..
[41] Satinder P. Singh,et al. The Efficient Learning of Multiple Task Sequences , 1991, NIPS.
[42] Jieyu Zhao,et al. Direct Policy Search and Uncertain Policy Evaluation , 1998 .
[43] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..
[44] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[45] Jürgen Schmidhuber,et al. Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.
[46] Mark B. Ring. Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.
[47] Mark B. Ring. Incremental Development of Complex Behaviors , 1991, ML.
[48] Astro Teller,et al. The evolution of mental models , 1994 .
[49] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.
[50] Jürgen Schmidhuber,et al. Learning to generate sub-goals for action sequences , 1991 .
[51] David J. C. MacKay,et al. Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.
[52] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[53] Balaraman Ravindran,et al. Improved Switching among Temporally Abstract Actions , 1998, NIPS.
[54] Jürgen Schmidhuber,et al. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .
[55] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[56] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[57] Jürgen Schmidhuber,et al. LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.
[58] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[59] Jenq-Neng Hwang,et al. Query-based learning applied to partially trained multilayer perceptrons , 1991, IEEE Trans. Neural Networks.
[60] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[61] Jürgen Schmidhuber,et al. Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).
[62] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[63] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.
[64] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[65] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[66] Ray J. Solomonoff,et al. The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.
[67] R. Bellman,et al. V. Adaptive Control Processes , 1964 .
[68] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[69] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .
[70] David A. Cohn,et al. Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.
[71] Nichael Lynn Cramer,et al. A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.
[72] Ron Sun,et al. Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors , 2000, IEEE Trans. Syst. Man Cybern. Part B.
[73] John H. Holland,et al. Properties of the Bucket Brigade , 1985, ICGA.
[74] Wolfgang Banzhaf,et al. Genetic Programming: An Introduction , 1997 .
[75] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[76] Jürgen Schmidhuber. Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995, ICML.