Adaptive modelling and planning for learning intelligent behaviour
暂无分享,去创建一个
[1] Ralf Herbrich,et al. Learning Kernel Classifiers: Theory and Algorithms , 2001 .
[2] Stephen P. Brooks,et al. Markov Decision Processes. , 1995 .
[3] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[4] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[5] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[6] G. Rota. The Number of Partitions of a Set , 1964 .
[7] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[8] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[9] Ivan Bratko,et al. Transfer of control skill by machine learning , 1997 .
[10] Vivek S. Borkar,et al. Optimal Control of Diffusion Processes , 1989 .
[11] Paul A. Crook. Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds , 2007 .
[12] J. Simpson,et al. The Oxford English Dictionary , 1884 .
[13] Shai Ben-David,et al. On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..
[14] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[15] Joanne H. Walker,et al. Evolving Controllers for Real Robots: A Survey of the Literature , 2003, Adapt. Behav..
[16] M. Perman,et al. Semi-Markov models with an application to power-plant reliability analysis , 1997 .
[17] S. Hawking. The Universe in a Nutshell , 2006 .
[18] Rémi Munos. Apprentissage par renforcement, étude du cas continu , 1997 .
[19] Bart Selman,et al. Planning as Satisfiability , 1992, ECAI.
[20] Kevin D. Seppi,et al. P3VI: a partitioned, prioritized, parallel value iterator , 2004, ICML.
[21] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.
[22] Saul B. Gelfand,et al. Classification trees with neural network feature extraction , 1992, IEEE Trans. Neural Networks.
[23] Mykel J. Kochenderfer,et al. Adaptive Partitioning of State Spaces using Decision Graphs for Real-Time Modeling and Planning , 2005 .
[24] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[25] David G. Stork,et al. Pattern Classification , 1973 .
[26] Jessica K. Hodgins,et al. Animating Athletic Motion Planning By Example , 2000, Graphics Interface.
[27] Simon Kasif,et al. Induction of Oblique Decision Trees , 1993, IJCAI.
[28] W. M. Bolstad. Introduction to Bayesian Statistics , 2004 .
[29] Maria Fox,et al. PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..
[30] Randal E. Bryant,et al. Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.
[31] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[32] E. Lehmann. Testing Statistical Hypotheses , 1960 .
[33] Richard E. Neapolitan,et al. Learning Bayesian networks , 2007, KDD '07.
[34] D. Moore. Simplicial Mesh Generation with Applications , 1992 .
[35] R. Rivest. Learning Decision Lists , 1987, Machine Learning.
[36] William E. Hart,et al. Recent Advances in Memetic Algorithms , 2008 .
[37] David S. Watkins,et al. Fundamentals of matrix computations , 1991 .
[38] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[39] Robert E. Tarjan,et al. Relaxed heaps: an alternative to Fibonacci heaps with applications to parallel computation , 1988, CACM.
[40] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[41] Yoram Singer,et al. Efficient Bayesian Parameter Estimation in Large Discrete Domains , 1998, NIPS.
[42] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[43] Stephan Pareigis,et al. Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes , 1996, NIPS.
[44] Anne Condon,et al. On the undecidability of probabilistic planning and related stochastic optimization problems , 2003, Artif. Intell..
[45] Beate Bollig,et al. Improving the Variable Ordering of OBDDs Is NP-Complete , 1996, IEEE Trans. Computers.
[46] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[47] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[48] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[49] L. A. Goodman. On the Exact Variance of Products , 1960 .
[50] Manuela M. Veloso,et al. TTree: Tree-Based State Generalization with Temporally Abstract Actions , 2002, SARA.
[51] Nils J. Nilsson,et al. Correction to "A Formal Basis for the Heuristic Determination of Minimum Cost Paths" , 1972, SGAR.
[52] Mykel J. Kochenderfer. Adaptive Modeling and Planning for Reactive Agents , 2005, AAAI.
[53] Paul E. Utgoff,et al. On integrating apprentice learning and reinforcement learning , 1996 .
[54] Paolo Traverso,et al. Automated planning - theory and practice , 2004 .
[55] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .
[56] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[57] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[58] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[59] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[60] Bernd Fritzke,et al. A Growing Neural Gas Network Learns Topologies , 1994, NIPS.
[61] Luc De Raedt,et al. Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..
[62] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[63] Sven Koenig,et al. Speeding up the Parti-Game Algorithm , 2002, NIPS.
[64] Peter Stone,et al. State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.
[65] V. Borkar. Controlled diffusion processes , 2005, math/0511077.
[66] Alain Bensoussan,et al. Stochastic Production Planning with Production Constraints , 1980 .
[67] Ralf Korn,et al. A Stochastic Control Approach to Portfolio Problems with Stochastic Interest Rates , 2001, SIAM J. Control. Optim..
[68] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[69] Ethem Alpaydin,et al. Omnivariate decision trees , 2001, IEEE Trans. Neural Networks.
[70] Sholom M. Weiss,et al. Small Sample Error Rate Estimation for k-NN Classifiers , 1991, IEEE Trans. Pattern Anal. Mach. Intell..
[71] Guy L. Steele,et al. The Java Language Specification , 1996 .
[72] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[73] Leo Breiman,et al. Classification and Regression Trees , 1984 .
[74] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.
[75] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.
[76] B. Efron. Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .
[77] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[78] Teuvo Kohonen,et al. Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.
[79] Ricardo A. Baeza-Yates,et al. Searching in metric spaces , 2001, CSUR.
[80] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[81] Rui Camacho,et al. Building symbolic representations of intuitive real-time skills from performance data , 1994, Machine Intelligence 13.
[82] James L. Carroll,et al. Memory-guided exploration in reinforcement learning , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).
[83] Hans Ulrich Simon,et al. Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..
[84] Fausto Giunchiglia,et al. A Theory of Abstraction , 1992, Artif. Intell..
[85] P. Sopp. Cluster analysis. , 1996, Veterinary immunology and immunopathology.
[86] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[87] Hanan Samet,et al. Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.
[88] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[89] Jan Ramon. Thesis: clustering and instance based learning in first order logic , 2002 .
[90] Nils J. Nilsson,et al. Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..
[91] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[92] J. Bert Keats,et al. Statistical Methods for Reliability Data , 1999 .
[93] R. Cook. Influential Observations in Linear Regression , 1979 .
[94] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[95] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .
[96] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..
[97] P. Kumar,et al. A new family of optimal adaptive controllers for Markov chains , 1982 .
[98] Stuart I. Reynolds,et al. Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning , 2000, International Conference on Machine Learning.
[99] D. R. Fulkerson,et al. Flows in Networks. , 1964 .
[100] David W. Aha,et al. Instance-Based Learning Algorithms , 1991, Machine Learning.
[101] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[102] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[103] Paul E. Utgoff,et al. Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.
[104] Saso Dzeroski,et al. Integrating Guidance into Relational Reinforcement Learning , 2004, Machine Learning.
[105] Satoru Kawai,et al. An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..
[106] Sridhar Mahadevan,et al. Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.
[107] Tony R. Martinez,et al. Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..
[108] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .
[109] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[110] Teuvo Kohonen,et al. Self-Organizing Maps , 2010 .
[111] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[112] A. Gosavi,et al. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking , 2002 .
[113] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[114] G. Gates. The Reduced Nearest Neighbor Rule , 1998 .
[115] Marco Patella,et al. Searching in metric spaces with user-defined and approximate distances , 2002, TODS.
[116] James F. Allen. Towards a General Theory of Action and Time , 1984, Artif. Intell..
[117] Nils J. Nilsson,et al. Artificial Intelligence: A New Synthesis , 1997 .
[118] Nils J. Nilsson,et al. Toward agent programs with circuit semantics , 1992 .
[119] Andrew James Smith,et al. Applications of the self-organising map to reinforcement learning , 2002, Neural Networks.
[120] Fred W. Glover,et al. Tabu Search , 1997, Handbook of Heuristics.
[121] Ronald A. Howard,et al. Dynamic Probabilistic Systems , 1971 .
[122] Rémi Munos,et al. A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.
[123] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .
[124] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.
[125] M. R. Mickey,et al. Estimation of Error Rates in Discriminant Analysis , 1968 .
[126] Nils J. Nilsson,et al. Teleo-Reactive Programs and the Triple-Tower Architecture , 2001, Electron. Trans. Artif. Intell..
[127] A. Atiya,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.
[128] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[129] R. Fikes,et al. JTP : A System Architecture and Component Library for Hybrid Reasoning , 2003 .
[130] Peter N. Yianilos,et al. Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.
[131] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[132] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[133] Subbarao Kambhampati,et al. Planning as constraint satisfaction: Solving the planning graph by compiling it into CSP , 2001, Artif. Intell..
[134] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.
[135] Scott Sherwood Benson,et al. Learning action models for reactive autonomous agents , 1996 .
[136] Paul E. Utgoff,et al. Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .
[137] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[138] Satinder Singh Baveja,et al. Using predictions for planning and modeling in stochastic environments , 2005 .
[139] James E. Gentle,et al. Elements of computational statistics , 2002 .
[140] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[141] Drew McDermott,et al. Modeling a Dynamic and Uncertain World I: Symbolic and Probabilistic Reasoning About Change , 1994, Artif. Intell..
[142] Jan Ramon,et al. Clustering and instance based learning in first order logic , 2002, AI Communications.
[143] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[144] Minoru Asada,et al. Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.
[145] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[146] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .
[147] Avrim Blum,et al. Fast Planning Through Planning Graph Analysis , 1995, IJCAI.
[148] R. Andrew McCallum,et al. Hidden state and reinforcement learning with instance-based state identification , 1996, IEEE Trans. Syst. Man Cybern. Part B.
[149] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[150] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[151] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[152] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .
[153] Stephen R. Marsland,et al. A self-organising network that grows when required , 2002, Neural Networks.
[154] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[155] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.
[156] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.
[157] Iraj Kalantari,et al. A Data Structure and an Algorithm for the Nearest Point Problem , 1983, IEEE Transactions on Software Engineering.
[158] Enrico Macii,et al. Algebric Decision Diagrams and Their Applications , 1997, ICCAD '93.
[159] A. Einstein. On the Method of Theoretical Physics , 1934, Philosophy of Science.
[160] José del R. Millán,et al. Continuous-Action Q-Learning , 2002, Machine Learning.
[161] Luis H. R. Alvarez,et al. Stochastic Forest Stand Value and Optimal Timber Harvesting , 2003, SIAM J. Control. Optim..
[162] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.
[163] Manuela Veloso,et al. Tree based hierarchical reinforcement learning , 2002 .
[164] Rui Xu,et al. Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.
[165] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[166] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[167] Håkan L. S. Younes,et al. The First Probabilistic Track of the International Planning Competition , 2005, J. Artif. Intell. Res..
[168] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[169] E. Kandel. The Molecular Biology of Memory Storage: A Dialogue Between Genes and Synapses , 2001, Science.
[170] Mary Jo Nye,et al. The modern physical and mathematical sciences , 2003 .
[171] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.
[172] Ralph Johnson,et al. design patterns elements of reusable object oriented software , 2019 .
[173] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[174] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .
[175] Nello Cristianini,et al. Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.
[176] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[177] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .
[178] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[179] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[180] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[181] Shin Ishii,et al. A model-based reinforcement learning: a computational model and an fMRI study , 2003, ESANN.
[182] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[183] R. Baierlein. Probability Theory: The Logic of Science , 2004 .
[184] De,et al. Relational Reinforcement Learning , 2001, Encyclopedia of Machine Learning and Data Mining.
[185] Robert Zubek,et al. Applying Inexpensive AI Techniques to Computer Games , 2002, IEEE Intell. Syst..
[186] W. Fleming,et al. Optimal Control for Partially Observed Diffusions , 1982 .
[187] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[188] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[189] Blai Bonet,et al. Planning as heuristic search , 2001, Artif. Intell..
[190] Wynn C. Stirling,et al. Satisficing Revisited , 2000, Minds and Machines.
[191] G. W. Milligan,et al. An examination of procedures for determining the number of clusters in a data set , 1985 .
[192] Nils J. Nilsson,et al. Learning Strategies for Mid-Level Robot Control: Some Preliminary Considerations and Experiments , 2000 .
[193] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[194] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[195] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[196] David George Heath,et al. A geometric framework for machine learning , 1993 .
[197] M. R. Genesereth,et al. Knowledge Interchange Format Version 3.0 Reference Manual , 1992, LICS 1992.
[198] T. Tjelta,et al. Semi-Markov multistate modeling of the land mobile propagation channel for geostationary satellites , 2002 .
[199] Thomas G. Dietterich. State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.
[200] Nils J. Nilsson,et al. Shakey the Robot , 1984 .
[201] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[202] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[203] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.
[204] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[205] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2004 .
[206] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.
[207] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[208] Leslie Pack Kaelbling,et al. Learning Planning Rules in Noisy Stochastic Worlds , 2005, AAAI.
[209] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[210] Apostolos N. Papadopoulos,et al. Nearest Neighbor Search:: A Database Perspective , 2004 .
[211] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..
[212] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[213] Chi-Ren Shyu,et al. EBS k-d Tree: An Entropy Balanced Statistical k-d Tree for Image Databases with Ground-Truth Labels , 2003, CIVR.
[214] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[215] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[216] Frederic Maire,et al. Automatic State Construction using Decision Tree for Reinforcement Learning Agents , 2004 .
[217] Kee-Eung Kim,et al. Solving Factored MDPs with Large Action Space Using Algebraic Decision Diagrams , 2002, PRICAI.
[218] Gerald DeJong,et al. The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping , 2003, ICML.
[219] Gerald Tesauro,et al. Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..
[220] Ramón López de Mántaras,et al. A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.
[221] Mykel J. Kochenderfer,et al. Modeling and Planning in Large State and Action Spaces , 2005 .
[222] Max Christopher Goebel. An Empirical Investigation into Function Approximation with Reinforcement Learning , 2005 .
[223] Ronald L. Rivest,et al. Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..
[224] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[225] Emden R. Gansner,et al. An open graph visualization system and its applications to software engineering , 2000 .
[226] B. Everitt,et al. Independent Cellular Processes for Hippocampal Memory Consolidation and Reconsolidation , 2004, Science.
[227] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[228] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[229] Ethem Alpaydın,et al. Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..
[230] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .
[231] Donald Michie,et al. Man-Machine Co-operation on a Learning Task , 1969 .
[232] Martijn van Otterlo,et al. A survey of reinforcement learning in relational domains , 2005 .
[233] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .
[234] Nils J. Nilsson,et al. Reacting, Planning, and Learning in an Autonomous Agent , 1996, Machine Intelligence 14.
[235] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[236] W. Nelson. Statistical Methods for Reliability Data , 1998 .
[237] Hendrik Van Brussel,et al. A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .
[238] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[239] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .
[240] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[241] Craig Boutilier,et al. Optimal and Approximate Stochastic Planning using Decision Diagrams , 2000 .
[242] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[243] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.
[244] Hendrik Blockeel,et al. Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..
[245] S. Gregoir,et al. Stochastic Limit Theory : An Introduction for Econometricians , 1996 .
[246] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[247] Marek Kretowski,et al. An Evolutionary Algorithm for Oblique Decision Tree Induction , 2004, ICAISC.
[248] Emden R. Gansner,et al. A Technique for Drawing Directed Graphs , 1993, IEEE Trans. Software Eng..
[249] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[250] Bernard Widrow,et al. Adaptive switching circuits , 1988 .
[251] Geoffrey J. Gordon,et al. Generalizing Dijkstra's Algorithm and Gaussian Elimination for Solving MDPs , 2005 .
[252] Carla E. Brodley,et al. Multivariate decision trees , 2004, Machine Learning.
[253] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.
[254] H. Anton. Elementary Linear Algebra , 1970 .
[255] Andrew James Smith,et al. Dynamic generalisation of continuous action spaces in reinforcement learning : a neurally inspired approach , 2002 .
[256] H. Simon,et al. Rational choice and the structure of the environment. , 1956, Psychological review.
[257] Dean A. Pomerleau,et al. Neural Network Perception for Mobile Robot Guidance , 1993 .
[258] Dorothea Heiss-Czedik,et al. An Introduction to Genetic Algorithms. , 1997, Artificial Life.
[259] Long Ji Lin,et al. Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..
[260] D. Luenberger. Optimization by Vector Space Methods , 1968 .
[261] Peter Eades,et al. A Heuristic for Graph Drawing , 1984 .
[262] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[263] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[264] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.
[265] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[266] Joseph Weizenbaum,et al. and Machine , 1977 .
[267] Michael L. Littman,et al. Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.
[268] Kevin D. Seppi,et al. Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..
[269] Roni Khardon,et al. Learning Action Strategies for Planning Domains , 1999, Artif. Intell..
[270] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.
[271] Ronald J. Williams,et al. Robust, Efficient, Globally-Optimized Reinforcement Learning with the Parti-Game Algorithm , 1998, NIPS.
[272] Guy L. Steele,et al. Java(TM) Language Specification, The (3rd Edition) (Java (Addison-Wesley)) , 2005 .
[273] Hans Vollbrecht,et al. Hierarchical reinforcement learning in continuous state spaces , 2003 .
[274] David L. Dowe,et al. MML Inference of Oblique Decision Trees , 2004, Australian Conference on Artificial Intelligence.
[275] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[276] Simon Kasif,et al. A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..
[277] Rémi Munos,et al. Reinforcement Learning for Continuous Stochastic Control Problems , 1997, NIPS.
[278] Dong Xu,et al. ProteinDBS: a real-time retrieval system for protein structure comparison , 2004, Nucleic Acids Res..
[279] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[280] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[281] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[282] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .
[283] Gene H. Golub,et al. Matrix computations , 1983 .
[284] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[285] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..
[286] P. Kloeden,et al. Numerical Solution of Stochastic Differential Equations , 1992 .
[287] Drew McDermott,et al. The 1998 AI Planning Systems Competition , 2000, AI Mag..
[288] Xi-Ren Cao,et al. A basic formula for online policy gradient algorithms , 2005, IEEE Transactions on Automatic Control.
[289] Bruno Sericola,et al. Performability Analysis Using Semi-Markov Reard Processes , 1990, IEEE Trans. Computers.
[290] David L. Neuhoff,et al. Quantization , 2022, IEEE Trans. Inf. Theory.
[291] Robert E. Tarjan,et al. Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.
[292] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[293] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[294] Ken Arnold,et al. The Java Programming Language , 1996 .
[295] 장훈,et al. [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .
[296] R. Fisher,et al. On the Mathematical Foundations of Theoretical Statistics , 1922 .
[297] Dimitri Bertsekas,et al. Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.
[298] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[299] Anil K. Jain,et al. Bootstrap Techniques for Error Estimation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[300] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[301] Kunle Olukotun,et al. The Future of Microprocessors , 2005, ACM Queue.
[302] C. G. Hilborn,et al. The Condensed Nearest Neighbor Rule , 1967 .
[303] Mykel J. Kochenderfer. Evolving Hierarchical and Recursive Teleo-reactive Programs through Genetic Programming , 2003, EuroGP.
[304] Leslie Pack Kaelbling,et al. Making Reinforcement Learning Work on Real Robots , 2002 .
[305] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[306] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[307] Paul J. Schweitzer,et al. Iterative Aggregation-Disaggregation Procedures for Discounted Semi-Markov Reward Processes , 1985, Oper. Res..
[308] Leslie Pack Kaelbling,et al. Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.
[309] Richard Bellman,et al. ON A ROUTING PROBLEM , 1958 .
[310] Paul Bourgine,et al. Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.
[311] I. Newton. Philosophiæ naturalis principia mathematica , 1973 .