Reinforcement Learning in Supervised Problem Domains
暂无分享,去创建一个
[1] John F. Kolen,et al. Evaluating Benchmark Problems by Random Guessing , 2001 .
[2] William F. Punch,et al. Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm , 2003, IEEE Trans. Syst. Man Cybern. Part B.
[3] Yee Whye Teh,et al. Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation , 2006, Cogn. Sci..
[4] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[5] G. Lewicki,et al. Approximation by Superpositions of a Sigmoidal Function , 2003 .
[6] Parametric Policy Gradients for Robotics , 2008 .
[7] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[8] Jürgen Schmidhuber,et al. A Python Experiment Suite , 2011 .
[9] Howard Straubing. Finite Automata, Formal Logic, and Circuit Complexity , 1994, Progress in Theoretical Computer Science.
[10] Shin Ishii,et al. Reinforcement learning for a biped robot based on a CPG-actor-critic method , 2007, Neural Networks.
[11] Mark B. Ring. Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.
[12] Mark B. Ring. Child: A First Step Towards Continual Learning , 1998, Learning to Learn.
[13] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[14] S. Dreyfus. The computational solution of optimal control problems with time lag , 1973 .
[15] Jürgen Schmidhuber,et al. Policy Gradient Critics , 2007, ECML.
[16] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[17] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[18] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[19] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[20] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[21] Majid Nili Ahmadabadi,et al. Face recognition using reinforcement learning , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..
[22] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[23] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[24] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[25] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[26] Kilian Q. Weinberger,et al. Feature hashing for large scale multitask learning , 2009, ICML '09.
[27] Eibe Frank,et al. Speeding Up Logistic Model Tree Induction , 2005, PKDD.
[28] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[29] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[30] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[31] Jürgen Schmidhuber,et al. Probabilistic Incremental Program Evolution: Stochastic Search Through Program Space , 1997, ECML.
[32] Patrick Gallinari,et al. Text Classification: A Sequential Reading Approach , 2011, ECIR.
[33] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[34] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[35] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[36] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[37] Christian Osendorfer,et al. Minimizing data consumption with sequential online feature selection , 2013, Int. J. Mach. Learn. Cybern..
[38] Majid Nili Ahmadabadi,et al. Attention control with reinforcement learning for face recognition under partial occlusion , 2011, Machine Vision and Applications.
[39] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[40] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[41] Michail G. Lagoudakis,et al. Binary action search for learning continuous-action control policies , 2009, ICML '09.
[42] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.
[43] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[44] Jürgen Schmidhuber,et al. Efficient model-based exploration , 1998 .
[45] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[46] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[47] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[48] S. Timmer,et al. Fitted Q Iteration with CMACs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[49] James Theiler,et al. Online Feature Selection using Grafting , 2003, ICML.
[50] Lucas Paletta,et al. Active object recognition by view integration and reinforcement learning , 2000, Robotics Auton. Syst..
[51] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .
[52] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[53] José del R. Millán,et al. Continuous-Action Q-Learning , 2002, Machine Learning.
[54] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[55] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[56] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[57] Jürgen Schmidhuber,et al. First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.
[58] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[59] Jürgen Schmidhuber,et al. Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.
[60] Jun Nakanishi,et al. Learning Movement Primitives , 2005, ISRR.
[61] Michèle Sebag,et al. Feature Selection as a One-Player Game , 2010, ICML.
[62] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[63] Stefan Schaal,et al. Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .
[64] Jürgen Schmidhuber,et al. Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).
[65] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[66] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[67] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[68] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[69] Tim Kovacs,et al. On the analysis and design of software for reinforcement learning, with a survey of existing systems , 2011, Machine Learning.
[70] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[71] Jan Peters,et al. Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.
[72] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[73] Tom Schaul,et al. Artificial curiosity for autonomous space exploration , 2011 .
[74] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[75] John E. Laird,et al. Learning to play , 2009 .
[76] Jürgen Schmidhuber,et al. Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity , 2007, Discovery Science.
[77] S. Dreyfus. The numerical solution of variational problems , 1962 .
[78] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[79] Jürgen Schmidhuber,et al. Self-Delimiting Neural Networks , 2012, ArXiv.
[80] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.
[81] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[82] Manuel Lopes,et al. Learning grasping affordances from local visual descriptors , 2009, 2009 IEEE 8th International Conference on Development and Learning.
[83] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[84] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.
[85] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[86] Peter Stagge,et al. Recurrent neural networks for time series classification , 2003, Neurocomputing.
[87] Sepp Hochreiter,et al. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..
[88] Tom Schaul,et al. Multi-Dimensional Deep Memory Atari-Go Players for Parameter Exploring Policy Gradients , 2010, ICANN.
[89] Stewart W. Wilson,et al. A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .
[90] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.
[91] Fang Liu,et al. Reinforcement learning-based feature learning for object tracking , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[92] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[93] Lucas Paletta,et al. Q-learning of sequential attention for visual object recognition from informative local descriptors , 2005, ICML.
[94] Henry J. Kelley,et al. Gradient Theory of Optimal Flight Paths , 1960 .
[95] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[96] Richard S. Sutton,et al. Open Theoretical Questions in Reinforcement Learning , 1999, EuroCOLT.
[97] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[98] Jürgen Schmidhuber,et al. Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[99] George E. Monahan,et al. A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .
[100] Frank Sehnke,et al. Robot Learning with State-Dependent Exploration , 2008 .
[101] Ludovic Denoyer,et al. Datum-Wise Classification: A Sequential Approach to Sparsity , 2011, ECML/PKDD.
[102] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[103] Ravi Sankar,et al. Time Series Prediction Using Support Vector Machines: A Survey , 2009, IEEE Computational Intelligence Magazine.
[104] Bernard Widrow,et al. Associative Storage and Retrieval of Digital Information in Networks of Adaptive “Neurons” , 1962 .
[105] Hiroshi Motoda,et al. Computational Methods of Feature Selection , 2022 .
[106] Jürgen Schmidhuber,et al. Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.
[107] Jürgen Schmidhuber,et al. Multi-dimensional Recurrent Neural Networks , 2007, ICANN.
[108] Jürgen Schmidhuber,et al. Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..
[109] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[110] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[111] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[112] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[113] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[114] Guy Shani,et al. Resolving Perceptual Aliasing In The Presence Of Noisy Sensors , 2004, NIPS.
[115] Corso Elvezia. Bridging Long Time Lags by Weight Guessing and \long Short Term Memory" , 1996 .
[116] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[117] David A. Cohn,et al. Improving generalization with active learning , 1994, Machine Learning.
[118] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[119] A. Roth,et al. Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .
[120] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[121] Raphaël Marée,et al. Reinforcement Learning with Raw Image Pixels as Input State , 2006, IWICPAS.
[122] Alin Albu-Schäffer,et al. Learning from demonstration: repetitive movements for autonomous service robotics , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[123] Nando de Freitas,et al. Learning attentional policies for tracking and recognition in video with deep networks , 2011, ICML.
[124] Christian Osendorfer,et al. Sequential Feature Selection for Classification , 2011, Australasian Conference on Artificial Intelligence.
[125] Jürgen Schmidhuber,et al. Python Experiment Suite Implementation , 2011 .
[126] H. Wold. Path Models with Latent Variables: The NIPALS Approach , 1975 .
[127] Les E. Atlas,et al. Recurrent neural networks and robust time series prediction , 1994, IEEE Trans. Neural Networks.
[128] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.
[129] Jürgen Schmidhuber,et al. Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..
[130] C. L. Giles,et al. Sequence Learning - Paradigms, Algorithms, and Applications , 2001 .
[131] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.
[132] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .
[133] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[134] Dana H. Ballard,et al. Modular Learning in Neural Networks , 1987, AAAI.
[135] Jürgen Schmidhuber,et al. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.
[136] Roland Hafner. Dateneffiziente selbstlernende neuronale Regler , 2009 .
[137] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.
[138] Jürgen Schmidhuber,et al. Active Learning with Adaptive Grids , 2001, ICANN.
[139] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .
[140] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..
[141] Foster J. Provost,et al. Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..
[142] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[143] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[144] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[145] Hao Wang,et al. Online Streaming Feature Selection , 2010, ICML.
[146] Frank Sehnke,et al. Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.
[147] R. Bellman. A Markovian Decision Process , 1957 .
[148] Ludovic Denoyer,et al. Sequence Labeling with Reinforcement Learning and Ranking Algorithms , 2007, ECML.
[149] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[150] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.
[151] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.