Generation of temporal sequences using local dynamic programming

The generation of a sequence of Control actions to move a system from an initial state to a final one is an ill-posed problem because the solution is not unique. Soft constraints like the minimization of a cost associated to control actions makes the problem mathematically solvable in the framework of optimal control theory. We present here a method to approximate the solution of the problems of this category based on Heuristic Dynamic Programming proposed by Werbos: Local Dynamic Programming. Its main features are the exploration of a volume around the actual trajectory and the introduction of a set of correcting functions. Its application to the generation of a trajectory whose kinematics is minimum jerk is presented; in this situation, the introduction of a short term temporal credit assignment improves the convergence tackling the lack of controllability in the Plant .

[1]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[2]  Physical Review , 1965, Nature.

[3]  N. Hogan An organizing principle for a class of voluntary movements , 1984, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[4]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[5]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[7]  Paul J. Werbos,et al.  Maximizing long-term gas industry profits in two minutes in Lotus using neural network methods , 1989, IEEE Trans. Syst. Man Cybern..

[8]  Hamid R. Berenji,et al.  Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[9]  Heskes,et al.  Learning in neural networks with local minima. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[10]  J. Paillard Brain and space , 1991 .

[11]  伊藤 正男 The cerebellum and neural control , 1984 .

[12]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[13]  Willard L. Miranker,et al.  Multiscale optimization in neural nets , 1991, IEEE Trans. Neural Networks.

[14]  Axel van Lamsweerde,et al.  Learning machine learning , 1991 .

[15]  Michael A. Arbib,et al.  A computational description of the organization of human reaching and prehension , 1992 .

[16]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[17]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[18]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[19]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[20]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[21]  David F. Shanno,et al.  Recent advances in numerical techniques for large scale optimization , 1990 .

[22]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[23]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[24]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[25]  Michael I. Jordan Supervised learning and systems with excess degrees of freedom , 1988 .

[26]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[27]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[28]  Bernd A. Berg,et al.  Locating global minima in optimization problems by a random-cost approach , 1993, Nature.

[29]  W. Wonham,et al.  Topics in mathematical system theory , 1972, IEEE Transactions on Automatic Control.

[30]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[31]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[32]  W. Thomas Miller,et al.  Real-time dynamic control of an industrial manipulator using a neural network-based learning controller , 1990, IEEE Trans. Robotics Autom..

[33]  Kumpati S. Narendra,et al.  Gradient methods for the optimization of dynamical systems containing neural networks , 1991, IEEE Trans. Neural Networks.

[34]  Tom Heskes,et al.  Retrieval of pattern sequences at variable speeds in a neural network with delays , 1992, Neural Networks.

[35]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[36]  宇野 洋二,et al.  Formation and control of optimal trajectory in human multijoint arm movement : minimum torque-change model , 1988 .