State-aggregation algorithms for learning probabilistic models for robot control

This thesis addresses the problem of learning probabilistic representations of dynamical systems with non-linear dynamics and hidden state in the form of partially observable Markov decision process (POMDP) models, with the explicit purpose of using these models for robot control. In contrast to the usual approach to learning probabilistic models, which is based on iterative adjustment of probabilities so as to improve the likelihood of the observed data, the algorithms proposed in this thesis take a di erent approach | they reduce the learning problem to that of state aggregation by clustering in an embedding space of delayed coordinates, and subsequently estimating transition probabilities between aggregated states (clusters). This approach has close ties to the dominant methods for system identi cation in the eld of control engineering, although the characteristics of POMDP models require very di erent algorithmic solutions. Apart from an extensive investigation of the performance of the proposed algorithms in simulation, they are also applied to two robots built in the course of our experiments. The rst one is a di erential-drive mobile robot with a minimal number of proximity sensors, which has to perform the well-known robotic task of self-localization along the perimeter of its workspace. In comparison to previous neural-net based approaches to the same problem, our algorithm achieved much higher spatial accuracy of localization. The other task is visual servo-control of an under-actuated arm which has to rotate a ying ball attached to it so as to maintain maximal height of rotation with minimal energy expenditure. Even though this problem is intractable for known control engineering methods due to its strongly non-linear dynamics and partially observable state, a control policy obtained by means of policy iteration on a POMDP model learned by our state-aggregation algorithm performed better than several alternative open-loop and closed-loop controllers.

[1]  Daniel S. Weld,et al.  UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.

[2]  Richard M. Murray,et al.  Nonlinear Control of Mechanical Systems: A Lagrangian Perspective , 1995 .

[3]  J. J. Shann,et al.  A fuzzy neural network for rule acquiring on fuzzy control systems , 1995 .

[4]  Ben J. A. Kröse,et al.  A self-organizing representation of sensor space for mobile robot navigation , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[5]  S. Lakshmivarahan,et al.  Learning Algorithms Theory and Applications , 1981 .

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  J. van,et al.  Adaptive state space quantisation : adding and removing neuronsBen , 1992 .

[8]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[9]  Benjamin Kuipers,et al.  Map Learning with Uninterpreted Sensors and Effectors , 1995, Artif. Intell..

[10]  John R. Koza,et al.  Evolution of a subsumption architecture that performs a wall following task for an autonomous mobile robot , 1994, COLT 1994.

[11]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[12]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[13]  D. Tritton,et al.  Ordered and chaotic motion of a forced spherical pendulum , 1986 .

[14]  Evangelos E. Milios,et al.  Globally Consistent Range Scan Alignment for Environment Mapping , 1997, Auton. Robots.

[15]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[16]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[17]  Edward Tunstel,et al.  Autonomous navigation using an adaptive hierarchy of multiple fuzzy-behaviors , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[18]  Benjamin Kuipers,et al.  Learning to Explore and Build Maps , 1994, AAAI.

[19]  Wolfram Burgard,et al.  Map learning and high-speed navigation in RHINO , 1998 .

[20]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[21]  Benjamin Kuipers,et al.  A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations , 1991, Robotics Auton. Syst..

[22]  Robin R. Murphy,et al.  Artificial intelligence and mobile robots: case studies of successful robot systems , 1998 .

[23]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[24]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[25]  Ben J. A. Kröse,et al.  A Self-learning Controller For Monocular Grasping , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  John E. Laird,et al.  Integrating, Execution, Planning, and Learning in Soar for External Environments , 1990, AAAI.

[27]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[28]  Andrew W. Moore,et al.  Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.

[29]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[30]  Ulrich Nehmzow,et al.  Using Motor Actions for Location Recognition , 1991 .

[31]  Alexander Zelinsky,et al.  A Mobile Robot Navigation Exploration Algorithm , 1992 .

[32]  Michael C. Mozer,et al.  Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[33]  Mark W. Spong,et al.  The swing up control problem for the Acrobot , 1995 .

[34]  Peter Haddawy,et al.  Probabilistic Logic Programming and Bayesian Networks , 1995, ASIAN.

[35]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[36]  Reid G. Simmons,et al.  Unsupervised learning of probabilistic models for robot navigation , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[37]  Hans P. Moravec Visual Mapping by a Robot Rover , 1979, IJCAI.

[38]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[39]  Andrew McCallum,et al.  Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.

[40]  S. Wrobel Concept Formation and Knowledge Revision , 1994, Springer US.

[41]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[42]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[43]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[44]  Risto Miikkulainen,et al.  Grounding Robotic Control with Genetic Neural Networks , 1994 .

[45]  Masahide Yoneyama,et al.  An ultrasonic visual sensor for three-dimensional object recognition using neural networks , 1992, IEEE Trans. Robotics Autom..

[46]  Yasuharu Koike,et al.  PII: S0893-6080(96)00043-3 , 1997 .

[47]  F. Takens Detecting strange attractors in turbulence , 1981 .

[48]  F. H. Adler Cybernetics, or Control and Communication in the Animal and the Machine. , 1949 .

[49]  Frank L. Lewis,et al.  Application of robust control techniques to a mobile robot system , 1992, J. Field Robotics.

[50]  Maja J. Mataric,et al.  Integration of representation into goal-driven behavior-based robots , 1992, IEEE Trans. Robotics Autom..

[51]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[52]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[53]  Ger Honderd,et al.  Wall-following control of a mobile robot , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[54]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[55]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[56]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[57]  Alessandro Saffiotti,et al.  Perception-Based Self-Localization Using Fuzzy Locations , 1995, Reasoning with Uncertainty in Robotics.

[58]  Floris Takens,et al.  On the numerical determination of the dimension of an attractor , 1985 .

[59]  John W. Miles,et al.  Resonant motion of a spherical pendulum , 1984 .

[60]  Stergios I. Roumeliotis,et al.  Collective localization: a distributed Kalman filter approach to localization of groups of mobile robots , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[61]  Wolfram Burgard,et al.  A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots , 1998, Auton. Robots.

[62]  Wallace E. Larimore,et al.  Canonical variate analysis in identification, filtering, and adaptive control , 1990, 29th IEEE Conference on Decision and Control.

[63]  Gerald J. Sussman,et al.  Structure and interpretation of classical mechanics , 2001 .

[64]  Shin'ichi Yuta,et al.  Wall following using angle information measured by a single ultrasonic transducer , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[65]  Katharina Morik,et al.  Learning Concepts from Sensor Data of a Mobile Robot , 2005, Machine Learning.

[66]  Eric A. Wan,et al.  Time series prediction by using a connectionist network with internal delay lines , 1993 .

[67]  Elizabeth C. Hirschman,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[68]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[69]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[70]  TesauroGerald Practical Issues in Temporal Difference Learning , 1992 .

[71]  Wolfram Burgard,et al.  Probabilistic mapping of an environment by a mobile robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[72]  Maja J. Matarić,et al.  A Distributed Model for Mobile Robot Environment-Learning and Navigation , 1990 .

[73]  Andrew W. Moore,et al.  Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[74]  Alexander Zelinsky,et al.  Mobile robot map making using sonar , 1991, J. Field Robotics.

[75]  Sebastian Thrun,et al.  Probabilistic Algorithms in Robotics , 2000, AI Mag..

[76]  Michael C. Mozer,et al.  SLUG: A connectionist architecture for inferring the structure of finite-state environments , 2004, Machine Learning.

[77]  Bo Wahlberg,et al.  Analysis of state space system identification methods based on instrumental variables and subspace fitting , 1997, Autom..

[78]  Hans P. Moravec Robot: Mere Machine to Transcendent Mind , 1998 .

[79]  S Karlin,et al.  An efficient algorithm for identifying matches with errors in multiple long molecular sequences. , 1991, Journal of molecular biology.

[80]  Lennart Ljung,et al.  Nonlinear black-box modeling in system identification: a unified overview , 1995, Autom..

[81]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.