Reinforcement Learning in Supervised Problem Domains

This thesis discusses novel information processing methods and algorithms capable of directing attention to relevant details and analysing them in sequence to keep up with the data explosion that has been witnessed in the last decades. The concept of "data consumption" is introduced together with means to minimise it during classification tasks, and a new sequence learning approach is presented that builds an explicit contextual state while traversing sequences.

[1]  John F. Kolen,et al.  Evaluating Benchmark Problems by Random Guessing , 2001 .

[2]  William F. Punch,et al.  Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Yee Whye Teh,et al.  Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation , 2006, Cogn. Sci..

[4]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[5]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[6]  Parametric Policy Gradients for Robotics , 2008 .

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  Jürgen Schmidhuber,et al.  A Python Experiment Suite , 2011 .

[9]  Howard Straubing Finite Automata, Formal Logic, and Circuit Complexity , 1994, Progress in Theoretical Computer Science.

[10]  Shin Ishii,et al.  Reinforcement learning for a biped robot based on a CPG-actor-critic method , 2007, Neural Networks.

[11]  Mark B. Ring Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[12]  Mark B. Ring Child: A First Step Towards Continual Learning , 1998, Learning to Learn.

[13]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[14]  S. Dreyfus The computational solution of optimal control problems with time lag , 1973 .

[15]  Jürgen Schmidhuber,et al.  Policy Gradient Critics , 2007, ECML.

[16]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[17]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[18]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[19]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[20]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[21]  Majid Nili Ahmadabadi,et al.  Face recognition using reinforcement learning , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[22]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[23]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[24]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[25]  Martin A. Riedmiller,et al.  Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[26]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[27]  Eibe Frank,et al.  Speeding Up Logistic Model Tree Induction , 2005, PKDD.

[28]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[29]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[30]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[31]  Jürgen Schmidhuber,et al.  Probabilistic Incremental Program Evolution: Stochastic Search Through Program Space , 1997, ECML.

[32]  Patrick Gallinari,et al.  Text Classification: A Sequential Reading Approach , 2011, ECIR.

[33]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[34]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[37]  Christian Osendorfer,et al.  Minimizing data consumption with sequential online feature selection , 2013, Int. J. Mach. Learn. Cybern..

[38]  Majid Nili Ahmadabadi,et al.  Attention control with reinforcement learning for face recognition under partial occlusion , 2011, Machine Vision and Applications.

[39]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[40]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[41]  Michail G. Lagoudakis,et al.  Binary action search for learning continuous-action control policies , 2009, ICML '09.

[42]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[43]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[44]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .

[45]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[46]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[47]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[48]  S. Timmer,et al.  Fitted Q Iteration with CMACs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[49]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[50]  Lucas Paletta,et al.  Active object recognition by view integration and reinforcement learning , 2000, Robotics Auton. Syst..

[51]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[52]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[53]  José del R. Millán,et al.  Continuous-Action Q-Learning , 2002, Machine Learning.

[54]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[55]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[56]  Csaba Szepesvári,et al.  Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[57]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[58]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[59]  Jürgen Schmidhuber,et al.  Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[60]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[61]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[62]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[63]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[64]  Jürgen Schmidhuber,et al.  Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[65]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[66]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[67]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[68]  Leemon C Baird,et al.  Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .

[69]  Tim Kovacs,et al.  On the analysis and design of software for reinforcement learning, with a survey of existing systems , 2011, Machine Learning.

[70]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[71]  Jan Peters,et al.  Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.

[72]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[73]  Tom Schaul,et al.  Artificial curiosity for autonomous space exploration , 2011 .

[74]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[75]  John E. Laird,et al.  Learning to play , 2009 .

[76]  Jürgen Schmidhuber,et al.  Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity , 2007, Discovery Science.

[77]  S. Dreyfus The numerical solution of variational problems , 1962 .

[78]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[79]  Jürgen Schmidhuber,et al.  Self-Delimiting Neural Networks , 2012, ArXiv.

[80]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[81]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[82]  Manuel Lopes,et al.  Learning grasping affordances from local visual descriptors , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[83]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[84]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[85]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[86]  Peter Stagge,et al.  Recurrent neural networks for time series classification , 2003, Neurocomputing.

[87]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[88]  Tom Schaul,et al.  Multi-Dimensional Deep Memory Atari-Go Players for Parameter Exploring Policy Gradients , 2010, ICANN.

[89]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[90]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[91]  Fang Liu,et al.  Reinforcement learning-based feature learning for object tracking , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[92]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[93]  Lucas Paletta,et al.  Q-learning of sequential attention for visual object recognition from informative local descriptors , 2005, ICML.

[94]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[95]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[96]  Richard S. Sutton,et al.  Open Theoretical Questions in Reinforcement Learning , 1999, EuroCOLT.

[97]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[98]  Jürgen Schmidhuber,et al.  Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[99]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[100]  Frank Sehnke,et al.  Robot Learning with State-Dependent Exploration , 2008 .

[101]  Ludovic Denoyer,et al.  Datum-Wise Classification: A Sequential Approach to Sparsity , 2011, ECML/PKDD.

[102]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[103]  Ravi Sankar,et al.  Time Series Prediction Using Support Vector Machines: A Survey , 2009, IEEE Computational Intelligence Magazine.

[104]  Bernard Widrow,et al.  Associative Storage and Retrieval of Digital Information in Networks of Adaptive “Neurons” , 1962 .

[105]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[106]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[107]  Jürgen Schmidhuber,et al.  Multi-dimensional Recurrent Neural Networks , 2007, ICANN.

[108]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[109]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[110]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[111]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[112]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[113]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[114]  Guy Shani,et al.  Resolving Perceptual Aliasing In The Presence Of Noisy Sensors , 2004, NIPS.

[115]  Corso Elvezia Bridging Long Time Lags by Weight Guessing and \long Short Term Memory" , 1996 .

[116]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[117]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[118]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[119]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[120]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[121]  Raphaël Marée,et al.  Reinforcement Learning with Raw Image Pixels as Input State , 2006, IWICPAS.

[122]  Alin Albu-Schäffer,et al.  Learning from demonstration: repetitive movements for autonomous service robotics , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[123]  Nando de Freitas,et al.  Learning attentional policies for tracking and recognition in video with deep networks , 2011, ICML.

[124]  Christian Osendorfer,et al.  Sequential Feature Selection for Classification , 2011, Australasian Conference on Artificial Intelligence.

[125]  Jürgen Schmidhuber,et al.  Python Experiment Suite Implementation , 2011 .

[126]  H. Wold Path Models with Latent Variables: The NIPALS Approach , 1975 .

[127]  Les E. Atlas,et al.  Recurrent neural networks and robust time series prediction , 1994, IEEE Trans. Neural Networks.

[128]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[129]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[130]  C. L. Giles,et al.  Sequence Learning - Paradigms, Algorithms, and Applications , 2001 .

[131]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[132]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[133]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[134]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[135]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[136]  Roland Hafner Dateneffiziente selbstlernende neuronale Regler , 2009 .

[137]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[138]  Jürgen Schmidhuber,et al.  Active Learning with Adaptive Grids , 2001, ICANN.

[139]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[140]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[141]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[142]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[143]  Rémi Munos,et al.  Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[144]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[145]  Hao Wang,et al.  Online Streaming Feature Selection , 2010, ICML.

[146]  Frank Sehnke,et al.  Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.

[147]  R. Bellman A Markovian Decision Process , 1957 .

[148]  Ludovic Denoyer,et al.  Sequence Labeling with Reinforcement Learning and Ranking Algorithms , 2007, ECML.

[149]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[150]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[151]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.