Simulation Optimization of Actions of Robot Based on POMDP Model

Policy-gradient algorithm is a very important way of reinforcement learning algorithm,which is of significant value to a robot's navigation by itself.On the basis of partially observable Markov decision processes,two finite-memory policy-gradient algorithms,that is,model-based GAMP algorithm and model-free IState-GPOMDP algorithm,were implemented,and employed in the simulation of a robot walking in a maze.According to the analysis of experimental results,GAMP algorithm and IState-GPOMDP algorithm were optimized based on observation.And it is found that the step,the parameter in Policy-gradient algorithm,has effect,to some extent,on the efficiency of optimization of the robot's action policy under certain rewarding function circumstance.