论文信息 - Generation of temporal sequences using local dynamic programming

Generation of temporal sequences using local dynamic programming

The generation of a sequence of Control actions to move a system from an initial state to a final one is an ill-posed problem because the solution is not unique. Soft constraints like the minimization of a cost associated to control actions makes the problem mathematically solvable in the framework of optimal control theory. We present here a method to approximate the solution of the problems of this category based on Heuristic Dynamic Programming proposed by Werbos: Local Dynamic Programming. Its main features are the exploration of a volume around the actual trajectory and the introduction of a set of correcting functions. Its application to the generation of a trajectory whose kinematics is minimum jerk is presented; in this situation, the introduction of a short term temporal credit assignment improves the convergence tackling the lack of controllability in the Plant .

Michael A. Arbib | N. Alberto Borghese | N. A. Borghese | M. Arbib

[1] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[2] Physical Review , 1965, Nature.

[3] N. Hogan. An organizing principle for a class of voluntary movements , 1984, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[4] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[5] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[7] Paul J. Werbos,et al. Maximizing long-term gas industry profits in two minutes in Lotus using neural network methods , 1989, IEEE Trans. Syst. Man Cybern..

[8] Hamid R. Berenji,et al. Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.

[9] Heskes,et al. Learning in neural networks with local minima. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[10] J. Paillard. Brain and space , 1991 .

[11] 伊藤正男. The cerebellum and neural control , 1984 .

[12] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[13] Willard L. Miranker,et al. Multiscale optimization in neural nets , 1991, IEEE Trans. Neural Networks.

[14] Axel van Lamsweerde,et al. Learning machine learning , 1991 .

[15] Michael A. Arbib,et al. A computational description of the organization of human reaching and prehension , 1992 .

[16] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[17] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[18] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[19] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[20] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[21] David F. Shanno,et al. Recent advances in numerical techniques for large scale optimization , 1990 .

[22] Geoffrey E. Hinton,et al. OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[23] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[24] Paul J. Webros. A menu of designs for reinforcement learning over time , 1990 .

[25] Michael I. Jordan. Supervised learning and systems with excess degrees of freedom , 1988 .

[26] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[27] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[28] Bernd A. Berg,et al. Locating global minima in optimization problems by a random-cost approach , 1993, Nature.

[29] W. Wonham,et al. Topics in mathematical system theory , 1972, IEEE Transactions on Automatic Control.

[30] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[31] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[32] W. Thomas Miller,et al. Real-time dynamic control of an industrial manipulator using a neural network-based learning controller , 1990, IEEE Trans. Robotics Autom..

[33] Kumpati S. Narendra,et al. Gradient methods for the optimization of dynamical systems containing neural networks , 1991, IEEE Trans. Neural Networks.

[34] Tom Heskes,et al. Retrieval of pattern sequences at variable speeds in a neural network with delays , 1992, Neural Networks.

[35] Richard S. Sutton,et al. Neural networks for control , 1990 .

[36] 宇野洋二,et al. Formation and control of optimal trajectory in human multijoint arm movement : minimum torque-change model , 1988 .